Commit Graph

9011 Commits

Author SHA1 Message Date
orbiter
8952153ecf update to Balancer algorithm:
- create a load list from the current list of known hosts
- do not create this list for each Balancer.pop access
- create the list from those hosts which have a zero-waiting time
- select 1/3 from that list which have the most urls waiting
- get hosts from the wainting list in random order
- fixes for some delta-time computations
- always load all urls from hosts which have never been loaded before
2012-10-28 13:24:49 +01:00
orbiter
354f0d9acd moved static method from ClusteredScoreMap to MapDataMining because it
was not used in the ClusteredScoreMap class but only in MapDataMining
2012-10-28 11:29:53 +01:00
Michael Peter Christen
8e1248ffe3 force a commit in advance of a search for the administrator to get most
recent results even if commit time is high and an indexing is ongoing.
2012-10-26 15:35:42 +02:00
Michael Peter Christen
3b48c78190 added an option to force a commit to solr.
may be used by a search front-end in case that the commitWithinMs time
is too short to get recently indexed documents.
2012-10-26 07:39:07 +02:00
sixcooler
2d972f289a rise commitWithinMs to default-value from SwitchBoard
(result in lower hd-io)

no dots in memory-graph (there are to much of them)
2012-10-26 02:12:45 +02:00
orbiter
8fde1dd3b6 another performance and memory hack to graphics: this makes it possible
to produce a 100-Megapixel png network graphic image on my 6 year old
laptop in standard configuration in 10 seconds.
2012-10-25 21:40:27 +02:00
Michael Peter Christen
1baf498d59 - show more lines in online log
- reverse order is default now
2012-10-25 18:38:39 +02:00
Michael Peter Christen
55bdafbaf1 more image processing hacks 2012-10-25 18:20:05 +02:00
Michael Peter Christen
f2d0418218 because the new PngEncoder had a problem with the PixelGrabber which is
caused by a JRE bug, the PixelGrabber had to be circumvented using an
own frame buffer which can be read without a PixelGrabber. This resulted
in ultra-fast and much less memory-consuming transformation. YaCy images
are now generated really fast!
2012-10-25 17:59:20 +02:00
Michael Peter Christen
d5d64019e5 - added a method for the RasterPlotter to draw arrow endings to lines
- replaced the dot in the NetworkGraph with arrows
- enhanced the image drawing speed using pre-computed color values
- added more attention for OOM cases during very large image painting
2012-10-25 16:05:04 +02:00
Michael Peter Christen
342543a6c4 fix for host browser 2012-10-25 10:23:43 +02:00
Michael Peter Christen
85ca07b90e when a new crawl is started, an equal crawl, if still running, is
terminated and the corresponding crawl profile is deleted (this also
clears the crawl queue entries for that crawl profile)
2012-10-25 10:20:55 +02:00
Michael Peter Christen
906e51214a the web structure image shows the pivot dot in a different color 2012-10-25 10:18:28 +02:00
Michael Peter Christen
b3ffcde0c7 - prepared PngEncoder for concurrency: PixelGrabber.grabPixels is the
main time-consuming process. This shall be done in concurrency.
- added concurrent processes to call the PixelGrabber and framework to
do that (queues)

It is now possible to create 4k-Images (3840x2160) i.e. with the Network
Graphics servlet
2012-10-24 02:08:51 +02:00
Michael Peter Christen
e9c6f4ce2e - new order of data computation: first compute the size of
compressed deflater output, then assign an exact-sized byte[] which
makes resizing afterwards superfluous
- after all enhancements all class objects were removed; result is just
one short static method
- made objects final where possible
2012-10-24 00:41:09 +02:00
orbiter
c6a1b21399 added a 9-year old png encoder from David Eisenberg which I rewrote
quite a bit to remove all code that handles transparency. With this
highly specialized png writer it is possible to write png images much
faster that with the JRE built-in png writer.
In a second step it can be possible to add concurrency to increase
computation speed further.
2012-10-23 23:27:41 +02:00
orbiter
276dd6452b removed warnings 2012-10-23 19:08:44 +02:00
orbiter
59bf4677b6 added option to view the complete directory structure in host browser 2012-10-23 19:02:55 +02:00
Michael Peter Christen
b991685782 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-10-23 18:14:58 +02:00
Michael Peter Christen
7602fce0b9 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-23 18:12:48 +02:00
Michael Peter Christen
ea11a1efea fix for highlighting in gsa search 2012-10-23 18:11:49 +02:00
Michael Peter Christen
9eaede50e7 enhanced web structure images 2012-10-23 18:11:19 +02:00
Michael Peter Christen
b7ac1da6a3 gsa results shall have only one title in metadata and that should be the
visible title in the <title>-tag
2012-10-23 18:03:12 +02:00
sixcooler
206e7bcf94 whitelist yacyportalsearch aka search.yacy.net 2012-10-23 03:49:27 +02:00
Michael Peter Christen
ae6feb5610 showing the web structure graph as animation in the crawl monitor 2012-10-23 02:50:26 +02:00
reger
87aab9aa7c - fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
- fix Document.addsubdocuments: sections might be initialized as Arrays.toList which does not provide the used .addAll methode
   see e.g. http://kamleshkr.wordpress.com/2010/02/17/inside-java-arrays-aslistt-a/
2012-10-22 22:48:35 +02:00
Michael Peter Christen
39317a6c66 enhanced webstructure image: introduced
- multiple hosts can be listed (comma-separated) as host argument
- new 'bf'-attribut (branch factor): the maximum number of edges per
node
- the bf-value is computed automatically
- ordering of nodes when the graphic is drawed: mostly the drawing ends
with an limitation eg. number of nodes. When this happens, it should be
ensured that more 'interesting' nodes are painted in advance. This is
now done by sorting all nodes by the number of links they have in de
distant sub-graph.
2012-10-22 16:23:39 +02:00
sixcooler
47ae7e322e smaller dhtDispatcher.cloudSize
@Orbiter: we talked about this times ago - please revert if I'm wrong
2012-10-21 20:05:28 +02:00
sixcooler
57ddd63888 not hold a expensive cache of references for DHT-out,but but load them
on demand
see: http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4530
2012-10-21 20:00:36 +02:00
reger
1dc6482feb format crawler timeout output string in seconds (was days) 2012-10-21 03:00:05 +02:00
Michael Peter Christen
ef937af35d more custom field usage in gsa search result 2012-10-18 15:26:55 +02:00
Michael Peter Christen
ea27d2e5f6 fixed more getSolrFieldName usages 2012-10-18 15:21:05 +02:00
Michael Peter Christen
ce0e5b1e17 - more refactoring / private methods
- fix for usage of custom solr field names
2012-10-18 15:09:04 +02:00
Michael Peter Christen
ccc3760a47 Refactoring and redesign of data architecture to make URIMetadataRow
superfluous. The target is to make a solr document as the core of YaCy
documents which would cause that many conversions can be removed. On the
way to this target the Equivalence of URIMetadataRow and URIMetadataNode
had to be removed to expose the usage of the old URIMetadataRow data
structure.
This refactoring already removes unneccessary conversions and should
make memory usage during indexing lower.
2012-10-18 14:29:11 +02:00
Michael Peter Christen
7f71dfab03 added a HostBrowser.xml api file and changed a bit of attribute naming 2012-10-18 11:42:13 +02:00
Michael Peter Christen
b400fc7b4d fix for file parser problem 2012-10-17 18:06:44 +02:00
Michael Peter Christen
e5b3c172ff removed hack which translated Solr documents to virtual RWI entries
which had been then mixed with remote RWIs. Now these Solr documents are
feeded into the result set as they appear during local and remote
search. That makes the search much faster.
2012-10-17 17:45:41 +02:00
Michael Peter Christen
6017691522 added an exception catch 2012-10-17 13:56:11 +02:00
Michael Peter Christen
68c7ed5ce9 added a shell script which can be used to delete the api action steering
table. This may be necessary if the api is called by remote command and
the recordings are not used. Then they can be deleted frequently by
calling this clear command using a cron job
2012-10-17 00:44:16 +02:00
Michael Peter Christen
ed803708ab added a shell script which can be used to add a rss feed to the index.
All pages linked in the rss feed are added. The process is not repeated
automatically. If you want to repeat this, add the command to a cron
job.
2012-10-17 00:31:59 +02:00
Michael Peter Christen
5d16c23a1f specified more URIMetadata as URIMetadataNode 2012-10-16 18:26:21 +02:00
Michael Peter Christen
43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
URIMetadataNode which creates the opportunity to access Solr objects
directly and use their information richness
- lazy initialization of the URIMetadataNode object - should cause less
computation and memory usage during search.
- removed dead code
2012-10-16 18:11:57 +02:00
Michael Peter Christen
cc98496ff3 enhanced the HostBrowser:
- showing also outbound links to other domains if there are any
- the outbound links browser shows also the link structure image
- showing even inbound links if the web structure graph has information
about that
- removed the left menu and made the HostBrowser a part of the top menu
for search
- moved the file search also to the top menu
- added hover information in the HostBrowser to explain what the click
means
- because the HostBrowser also links to the Metadata viewer ViewFile,
there should be a button to switch back to the HostBrowser: added that
also.
2012-10-16 17:13:18 +02:00
Michael Peter Christen
21fe8339b4 - enhanced generation of url objects
- enhanced computation of link structure graphics
- enhanced collection of data for link structures
2012-10-15 13:17:13 +02:00
Michael Peter Christen
4023d88b0b added date info in parser errors 2012-10-15 10:57:36 +02:00
Michael Peter Christen
1b02408936 use less cache 2012-10-11 14:32:37 +02:00
Michael Peter Christen
e45a3235e0 default cache size was much too high; decreased solr cache size 2012-10-11 12:03:48 +02:00
Michael Peter Christen
613cf7da7f enhancement to post argument parsing - possible fix to zero-filled
parameter values
2012-10-11 10:46:06 +02:00
Michael Peter Christen
36c13ed15b less solr prefetch 2012-10-11 10:17:05 +02:00
Michael Peter Christen
f3fc8eac80 fixed clear scripts 2012-10-11 10:16:37 +02:00