Commit Graph

5582 Commits

Author SHA1 Message Date
orbiter
d49238a637 more performance hacks: better default values for scaling, less memory usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5708 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 10:07:04 +00:00
orbiter
39644dc14e performance hacks to compare methods in database core
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 09:30:19 +00:00
orbiter
e2e7949feb replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5706 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 00:13:47 +00:00
orbiter
f6d989aa04 added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy.
The speed of the kelondro indexing class ObjectIndexCache can be compared with Javas standard TreeMap with the main method in IntegerHandleIndex. The result is, that the kelondro indexing needs only 1/5 of the memory that TreeMap uses! In exchange, the kelondro classes are slower than TreeMap, about four (!) times slower. However, this is not so bad because the better use of the memory is a strong advantage and makes it possible that YaCy can maintain such a large number of document (> 50 million) in one peer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5705 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-12 23:05:18 +00:00
borg-0300
0a2fabeef3 static TMPDIR
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-12 16:23:12 +00:00
lotus
9f7e62e900 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5703 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-12 16:20:04 +00:00
lotus
f35dc11dc4 allow crawl start from pages with script tags
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1910

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5702 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-12 16:12:50 +00:00
orbiter
6958eff196 removed unnecessary exceptions, extended testing in IntegerHandleIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-12 07:35:17 +00:00
orbiter
13c666adef performance hack to ObjectIndex put() method:
Java standard classes provide a Map Interface, that has a put() method that returns the object that was replaced by the object that was the argument of the put call. The kelondro ObjectIndex defined a put method in the same way, that means it also returned the previous value of the Entry object before the put call. However, this value was not used by the calling code in the most cases. Omitting a return of the previous value would cause some performance benefit. This change implements a put method that does not return the previous value to reflect the common use. Omitting the return of previous values will cause some benefit in performance. The functionality to get the previous value is still maintained, and provided with a new 'replace' method. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5700 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-11 20:23:19 +00:00
orbiter
1f1be1518c added stub for another performance hack: concurrent indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5699 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-11 15:52:03 +00:00
orbiter
3e4c28e188 enhanced count feature for kelondroRowSet. This is about twice as fast as before. Should speed up the collection analysis (half time!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-11 15:10:38 +00:00
orbiter
84e37387a2 fix for last commit and more testing stub
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-11 09:16:46 +00:00
orbiter
ca006c506d stub for performance enhancements for RowSet (no functional change yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-11 08:55:43 +00:00
orbiter
d988204875 better shutdown of tools
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-10 23:17:13 +00:00
orbiter
100247bdda added also an export and delete-feature to the URLAnalysis. This completes the clean-up feature for URLs. To do a complete clean-up of the url database, start the following:
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -export DATA/INDEX/freeworld/TEXT xml urls.xml diffurlcol.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -delete DATA/INDEX/freeworld/TEXT diffurlcol.dump

The export-feature is optional, the purpose of that function is to provide a back-up function for URLs to be deleted. The export function can also be used to create html files with embedded links and simple text-files. Simply replace the 'xml' word with 'html' or 'text'. The last argument in the cann, the diffurlcol.dump value, can also be omitted. This will cause that the complete URL database is exported. This is an alternative to the Web-Interface based export function.

The delete-feature is the only destructive method of the four presented here. Please use it with care. It is better to make a back-up of the url database files before starting the deletion.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-10 20:52:10 +00:00
hermens
8c60d6d117 In DHT selection delete only those references that were actually selected
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-10 13:56:30 +00:00
orbiter
60078cf322 added next tool for url analysis: check for references, that occur in the URL-DB but not in the RICOLLECTIONS
to use this, you must user the -incollection command before (see SVN 5687) and you need a 
used.dump file that has been produced with that process.

Now you can use that file, to do a URL-hash compare with the urls in the URL-DB. To do that, execute
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump
or use different names for the dump files or more memory.

As a result, you get the file diffurlcol.dump which contains all the url hashes that occur in the URL database, but not in the collections.
The file has the format
{hash-12}*
that means: 12 byte long hashes are listed without any separation.

The next step could be to process this file and delete all these URLs with the computed hashes, or to export them before deletion.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-10 13:38:40 +00:00
orbiter
b1ddc4a83f do not merge collections if ram == false
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 23:38:29 +00:00
orbiter
dbdd10da84 better logging and startup behaviour for referenceHash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5690 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 22:32:04 +00:00
borg-0300
d612430fce copy paste mistake
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 19:05:36 +00:00
borg-0300
acbdac1b67 corrected examples
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5688 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 18:37:50 +00:00
orbiter
d64836c34f added statistical analysis of URL reference
use that with the following command on a linux shell:
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump
for freeworld indexes.
For more details please see discussion below:
http://forum.yacy-websuche.de/viewtopic.php?p=13204#p13204


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 10:43:28 +00:00
orbiter
3b28daab40 code-beautification (to be consistent with external documentation paper)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 10:24:15 +00:00
orbiter
396a4451be increased timeout in ViewFile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5685 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 10:16:37 +00:00
orbiter
485c9406e5 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1915&hilit=&p=13249#p13249
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 10:14:49 +00:00
orbiter
858f800a07 more logging in httpd to detect shutdown cause. See also:
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1914&hilit=&p=13246#p13246

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-09 09:56:26 +00:00
orbiter
b80db04667 - refactoring of IntegerHandleIndex and LongHandleIndex (better method names)
- fix for problem in httpdFileHandler: mising close of open Files if tempate cache was disabled
- more memory for DHT selection required
- stub for URL reference hash statistics in index collections

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-08 21:37:17 +00:00
apfelmaennchen
9b6fac4a82 RichClient: better handling for small screens/windows
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5681 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-08 08:55:04 +00:00
lotus
92375cfaa3 translation update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-07 15:34:54 +00:00
lotus
8ee946bf1d show upnp status
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-07 15:31:24 +00:00
apfelmaennchen
f0947a20a8 - RichClient: increased ajax timeout to 10 sek.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-07 11:01:13 +00:00
apfelmaennchen
e73ac67f7e - for testing JsonP cross domain request I added apfelmaennchen-JsonP as search peer to RichClient
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-07 10:51:24 +00:00
orbiter
16f5c6a85e fixed merge method initialization in ReferenceContainer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-07 10:45:14 +00:00
apfelmaennchen
f7fd3d30c2 - various changes to RichClient
- improved search stability

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-07 10:20:34 +00:00
apfelmaennchen
4f3bdc64b5 - added ?callback= parameter for JsonP support
- this is needed for json ajax cross domain calls
- see: http://bob.pythonmac.org/archives/2005/12/05/remote-json-jsonp/

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-07 10:18:47 +00:00
apfelmaennchen
d84264946b - Randspalte für Navigatoren in Suchergebnissen (display=3)
- http://forum.yacy-websuche.de/viewtopic.php?f=9&t=1904#p13195

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 17:51:04 +00:00
orbiter
d7a493b4f5 added experimental timeline api
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 16:01:29 +00:00
orbiter
efcd95dc37 simplification of (internal) query process / refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 15:53:20 +00:00
orbiter
f1b712c29a small corrections to image loading methods in result presentation
especially loading of favicons in search results. This is a fix that
affects only searches in intranet/repository configurations.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 15:39:02 +00:00
orbiter
98f36a801a - small update to search result layout
- some more mime types

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 10:16:14 +00:00
lotus
4f6658b115 * non-sliding api icon
* regex doc links update

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-05 20:41:30 +00:00
orbiter
d4b56d5819 added more asserts to BLOBHeap.flushBuffer() to fix the problem described in
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1679&hilit=&p=13109#p13109

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-03 23:24:19 +00:00
f1ori
c545fcb9fa * add class to handle keys and signatures
* fix bug in serverCharBuffer
* add build-target to sign tar.gz (run ant dist sign)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 13:29:50 +00:00
orbiter
aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 11:04:13 +00:00
orbiter
6ffc6e3389 more refactoring of indexer and kelondro classes;
- integrating the indexer into kelondro as package 'text'
- renaming of classes in kelondro.index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5663 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 10:00:32 +00:00
orbiter
404bc21da9 simplification of (internal) query process / refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 08:48:27 +00:00
orbiter
76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-01 23:58:14 +00:00
orbiter
2df57b1fd1 refactoring of index collection class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-01 23:07:45 +00:00
lotus
39a177649b * added upnp listener for devices that do not respond to discovery but advertise themselves
* moved package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-28 14:36:23 +00:00
orbiter
d1d9fbae5c enabling the URLAnalysis to operate on multime input files, just use a wild card when calling the class from the command line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-26 23:47:41 +00:00