Commit Graph

660 Commits

Author SHA1 Message Date
orbiter
b15b059c0d fix for latest commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 19:53:21 +00:00
orbiter
c8624903c6 full redesign of index access data model:
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 15:29:00 +00:00
orbiter
bd5f4c78d8 - added default profile for surrogate indexing
- integrated surrogate indexing into indexing queue process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 08:01:38 +00:00
orbiter
c08f9b36a4 refactoring of wiki parser.
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-08 15:28:45 +00:00
orbiter
c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index.
Moved to version 0.74

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-03 13:23:45 +00:00
orbiter
9bfb2641db - removed deprecated threads
- added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-01 20:13:57 +00:00
orbiter
b3f75e48fa - enhanced balancer: auto-solving of waiting-deadlocks
- removed deprecated cache-init size value
- more debug lines for IndexCell cache dump merge

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5728 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-18 20:21:19 +00:00
orbiter
a9cea419ef Integration of the new index data structure IndexCell
This is the start of a testing phase for IndexCell data structure which will replace
the collections and caching strategy. IndexCall creation and maintenance is fast, has
no caching overhead, very low IO load and is the basis for the next data structure,
index segments.

IndexCell files are stored at DATA/<network>/TEXT/RICELL
With this commit still the old data structures are used, until a flag in yacy.conf is set.
To switch to the new data structure, set
useCell = true
in yacy.conf. Then you will have no access any more to TEXT/RICACHE and TEXT/RICOLLECTION

This code is still bleeding-edge development. Please do not use the new data structure for
production now. Future versions may have changed data types, or other storage locations.
The next main release will have a migration feature for old data structures.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-17 13:03:27 +00:00
orbiter
83792d9233 more refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-16 16:24:53 +00:00
orbiter
209f25f5f5 refactoring to integrate indexCell data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-16 00:18:37 +00:00
orbiter
7f67238f8b refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 14:56:25 +00:00
orbiter
14a1c33823 refactoring of wordIndex class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 10:34:51 +00:00
orbiter
e2e7949feb replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5706 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-13 00:13:47 +00:00
lotus
9f7e62e900 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5703 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-12 16:20:04 +00:00
orbiter
b80db04667 - refactoring of IntegerHandleIndex and LongHandleIndex (better method names)
- fix for problem in httpdFileHandler: mising close of open Files if tempate cache was disabled
- more memory for DHT selection required
- stub for URL reference hash statistics in index collections

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-08 21:37:17 +00:00
orbiter
efcd95dc37 simplification of (internal) query process / refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 15:53:20 +00:00
orbiter
aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 11:04:13 +00:00
orbiter
76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-01 23:58:14 +00:00
orbiter
2df57b1fd1 refactoring of index collection class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-01 23:07:45 +00:00
lotus
39a177649b * added upnp listener for devices that do not respond to discovery but advertise themselves
* moved package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-28 14:36:23 +00:00
orbiter
c12bb8a6d0 - refactoring of the http client
- added a protection against memory leaks for the access tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-19 16:24:46 +00:00
orbiter
4db80065ac select more
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-16 21:53:37 +00:00
orbiter
26978b2a25 - better memory protection in kelondro caches: computation of needed memory for cache grow
- removed excessive gc calls
- step to 16 vertical DHT partitions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5611 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-15 23:35:59 +00:00
lotus
4aad461100 added UPnP support
YaCy can now automatically forward ports on home routers
off by default

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-14 13:12:08 +00:00
orbiter
be0c492ae5 fix for memory leak bug in new dht transmissions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-14 00:01:05 +00:00
orbiter
40d9849aa4 - better control of chunk size in dht selection
- more restrict values in selection
- step to 4 vertical partitions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-12 14:47:32 +00:00
orbiter
333489420b - fix for NPE when loading the cytag image
- some hacks for less memory usage:
-- less usage of buffer and cache memory in EcoFS
-- buffer allocation on-demand in BufferedIOChunks
-- removed largest ybr idx

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5595 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-11 10:52:56 +00:00
orbiter
5393f356aa fix for termination problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5589 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-10 01:08:06 +00:00
orbiter
c25c334b75 replaced old DHT transmission method with new method. Many things have changed! some of them:
- after a index selection is made, the index is splitted into its vertical components
- from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue
- each splitted chunk gets its own transmission thread
- multiple transmission threads are started concurrently
- the process can be monitored with the blocking queue servlet
To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed.
The new index distribution model using a vertical DHT was implemented. An abstraction of this model
is implemented in the new dht package as interface. The freeworld network has now a configuration
of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free.
This modification has three main targets:
- enhance the DHT transmission speed
- with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times.
- the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before.
  with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs.
BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-10 00:06:59 +00:00
orbiter
01b97ef3f8 added new cybertag-tracking feature that was inspired by itgrl
from the forum discussion in
http://forum.yacy-websuche.de/viewtopic.php?p=12612#p12612

The feature will provide two basic entities:
- you can integrate image links which point to your yacy installation anywhere in the web.
  the image can be loaded with
  <img src="http://<yourpeer>:<yourport>/cytag.png?icon=invisible&nick=<yournickname_or_community_id>&tag=<anything>">
  This will place a invisible 1-pixel image. If you change the icon=invisible to icon=redpill, you will see a red pill
  Use this, to track your activity in the web.
- you can view your tracks at
  http://localhost:8080/Tracks.html
- There is a public api to your tracks at
  http://localhost:8080/api/tracks_p.json
  which needs authentication


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5581 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-06 15:06:19 +00:00
orbiter
d3e33fd6c1 removed strange retry logic from DHT transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5564 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-01 23:02:40 +00:00
orbiter
94110df85a moved logging partially to kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-31 01:06:56 +00:00
orbiter
024da2916b refactoring of logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 23:33:47 +00:00
orbiter
83ce65707a (almost) completed partition of classes in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:44:20 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
fc27bf8c4c refactoring of kelondro classes:
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 14:48:11 +00:00
orbiter
419469ac27 added more methods to control the vertical DHT (not yet active .. )
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5514 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-23 15:32:27 +00:00
orbiter
bed38a5f8c fix for uncaught exception in RSSReader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-13 00:20:37 +00:00
orbiter
9ef77d57f5 added an access control to the search interface using white/blacklists:
in the network configuration, you can configure a whiteliste and a blacklist
- blacklistet clients cannot search
- whitelistet client get never any search restrictions
- for all other clients: apply DoS search restrictions
Please see the example configuriation in yacy.network.freeworld.unit
by default, all clients from localhosts get whitlistet.
If you have your own YaCy network, please put all the IPs of your peers into the whitelist

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5475 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-12 10:55:48 +00:00
orbiter
e948df68ac longer timeout for queues during shutdown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5469 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-11 19:10:09 +00:00
orbiter
b2b7edae18 fixed interactive search
- added dummy servlet class, because otherwise the template engine is not triggered.
thats so because the yacy httpd works much faster as normal file server without a scan
of the served pages. Therefore each page with templates must now have a class file associated to it.
- fixed json output format of yacysearch

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5449 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-06 20:04:09 +00:00
orbiter
c6880ce28b removed the permanent cache flush and replaced it with a periodic cache flush
The cache is now flushed only for one second every ten seconds. During a crawl the cache
fills up completely, and is only flushed if space is needed for more documents.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-06 13:51:59 +00:00
orbiter
6072831235 no cr transmission for robinson peers
see also: http://forum.yacy-websuche.de/viewtopic.php?p=10290#p10290

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-03 23:44:42 +00:00
orbiter
b6bba18c37 replaced the storing procedure for the index ram cache with a method that generates BLOBHeap-compatible dumps
this is a migration step to support a new method to store the web index, which will also based on the same data structure. made also a lot of refactoring for a better structuring of the BLOBHeap class.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-01 22:31:16 +00:00
orbiter
e004da48d3 - added fast fingerprint computation for files (any). Will be used in new index dump method
- refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 12:22:13 +00:00
orbiter
e34ac22fbd - added new monitoring servlet at
http://localhost:8080/PerformanceConcurrency_p.html
- used the new monitoring to do some fine-tuning of the indexing queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-19 15:26:01 +00:00
orbiter
d376d81fc4 replaced busy thread control of crawl stacker by blocking threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 23:18:34 +00:00
orbiter
7535fd7447 - refactoring of CrawlEntry and CrawlStacker
- introduced blocking queues in CrawlStacker to make it ready for concurrency
- added a second busy thread for the CrawlStacker
The CrawlStacker is multithreaded. It shall be transformed into a BlockingThread in another step.
The concurrency of the stacker will hopefully solve some problems with cases where DNS blocks.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5395 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-17 22:53:06 +00:00
orbiter
2802138787 - refactoring of CrawlStacker (to prepare it for new multi-Threading to remove DNS lookup bottleneck)
- fix of shallBeOwnWord target computation heuristic


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-15 00:02:58 +00:00