Michael Peter Christen
433143ba40
removed protocol, tld, ext from the urlmask and created specific
...
navigation field for these
2012-12-19 12:45:40 +01:00
Michael Peter Christen
84f82541e8
search process enhancements
2012-12-19 10:41:22 +01:00
Michael Peter Christen
02020b590b
- removed all extension types from extension navigation which are not
...
proper/known
- automatically show the protocol navigation if there is more than http
and https
- automatically show the extension navigation if there is some media
content
2012-12-19 02:38:05 +01:00
Michael Peter Christen
01200f06cc
using the author field as solr-native facet. this makes it necessary to
...
introduce a copy-field for the author field to be copied to a string
field. This field is then used to generate facets. Without this field,
the facet would consist only of the words of the author names, not of
the full author string.
2012-12-19 01:56:33 +01:00
Michael Peter Christen
2a4c064c89
using the publisher information for the author field if no author is
...
given. This applies to cases where only the copyright field in the html
header is filled but not the author field
2012-12-19 01:54:35 +01:00
Michael Peter Christen
bab573361f
- using a filter query for facet restriction
...
- calculating the whole search result in at most two sub-queries from
solr
2012-12-19 01:00:57 +01:00
Michael Peter Christen
eac9650b31
added another solr field clickdepth_i which reflects the number of
...
clicks which are necessary to get from the portal of a host to a
specific document. At this time, only the start document is flagged with
clickdepth '0', all other with '-1'. To get the actual clickdepth, a
process must use crawled information to collect the actual number of
clicks. This will be added in another/next step.
2012-12-18 17:20:42 +01:00
Michael Peter Christen
1052263af3
- added a new solr field references_i which stores the number of
...
INCOMING links to the corresponding web page. This information is taken
from the reverse link index (a 'little sister' of the RWI index).
- this field can be of use to enhance the ranking because a web page
with more incoming links can be more more important than others. But
this is not true for typical link pages like menues. Therefore the
number of outgoing links is needed.
- added a new solr attribute 'bf' to solr queries which is a boost
function extension. this field can contain a formula which comuptes the
boost according to given field values. After some experiments the
following forumla is now default:
div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4
This takes the number of references and the inbound links. Further
experiments are needed to enhance that forumula.
2012-12-18 14:42:35 +01:00
Michael Peter Christen
7c3de8b4cd
- fix for localhost detection
...
- added IPv6 patterns for localhost detection
2012-12-18 12:52:20 +01:00
Michael Peter Christen
34f8786508
removed dependency of vocabulary navigation from Jena and it's
...
triplestore; the vocabulary search is now done using generic solr fields
which are created on-the-fly during runtime.
2012-12-18 02:29:03 +01:00
reger
ad71747525
fix: set defaul language to "en"
2012-12-16 20:53:45 +01:00
Michael Peter Christen
9319b90d8a
- fixes for host navigation
...
- fixes for filetype navigation
- removed unused code
2012-12-15 09:14:49 +01:00
Michael Peter Christen
cb5cbec14d
distinguishing modified query string and original query string
2012-12-15 00:05:46 +01:00
Michael Peter Christen
fb0fa9a102
- fixed 'delete from subpath' during crawl start which deleted nothing;
...
now works;
- changed some crawl start html design details
2012-12-11 13:38:28 +01:00
orbiter
712cc37c40
if maxFileSize < 0 then the file size limit is without limit.
2012-12-10 21:17:45 +01:00
orbiter
1f33c30d7b
re-integrating useForHost method (lost sometime?) to get the noProxy
...
pattern working again. Without using this method all remote urls
including the localhost had been accessed through the configured proxy
2012-12-10 20:44:29 +01:00
reger
f1a9c2e604
fix Servlet template on conditional file include with use of conditional template pattern in included template file (example IndexCreateQueues_p.html)
...
see bug http://bugs.yacy.net/view.php?id=215
2012-12-10 20:02:35 +01:00
orbiter
a4a780b871
- fix for bad url conversion in bookmarks when using smb urls
...
- fix for localhost hosts in solr schema host handling
2012-12-10 07:22:42 +01:00
reger
e80dfeca23
- making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171 )
...
- blacklist test adding explicite response text "not blocked" if no blacklist match
2012-12-08 06:34:48 +01:00
reger
e2d499be9e
remove NOT NEEDED reference to solr.YaCySchema from ConfigurationSet to be able to use ConfigurationSet for other conf files (than solr.keys.default.list).
2012-12-08 00:19:20 +01:00
Michael Peter Christen
a3cd3852ab
introduced a better place to update the lastacc time value in latency
2012-12-07 15:49:23 +01:00
Michael Peter Christen
864abcd33d
removed Latency update after URL selection because that causes
...
a completely wrong behaviour when cache fresh cases appear. Makes
re-crawling MUCH faster!
2012-12-07 15:35:44 +01:00
Michael Peter Christen
dd241d03bb
latency fix: only set last-visit time if access was actually by the
...
robot
2012-12-07 02:00:12 +01:00
Michael Peter Christen
118233a7e6
fix for bad xml in gsa result when doing a query with quotes
2012-12-07 01:35:02 +01:00
Michael Peter Christen
1e002ab18e
added another blacklist-cleaner into balancer
2012-12-07 01:27:24 +01:00
Michael Peter Christen
10527e28ae
fix for wrong display of error urls in HostBrowser
2012-12-07 00:31:10 +01:00
Michael Peter Christen
756772fbd3
fix for waitingtime computation for intranet configuration
2012-12-06 17:40:52 +01:00
Michael Peter Christen
fa27e5820f
- check blacklist (again) when taking urls from the crawl stack because
...
the blacklist may get extended during crawling
- removed debug output
2012-12-06 00:12:16 +01:00
Michael Peter Christen
adfecc6ba8
more robustness during shutdown
2012-12-05 18:20:43 +01:00
Michael Peter Christen
d4bfe9339e
Brute-force attempt to start solr in case of a memory problem.
...
I don't actually know if this is correct. It is a desperate try to get
YaCy running on production servers which must get alive even with
strange hacks like this. This is also related to a forum posting in
http://forum.yacy-websuche.de/viewtopic.php?t=4528&p=27135#p27135
2012-12-05 18:16:06 +01:00
Michael Peter Christen
8aa08261a7
update to Solr Boost handling
2012-12-05 12:26:42 +01:00
Michael Peter Christen
908ad2f174
Added a new servlet to configure the solr ranking using field boosts
2012-12-03 17:01:19 +01:00
Michael Peter Christen
a01e47b992
enhanced exists()-method for solr; should reduce a lot of IO during DHT
...
target selection
2012-12-02 17:29:37 +01:00
Michael Peter Christen
72f165d58b
added a Boost class which stores solr query boost values. The class can
...
be configured using the yacy.init file. The boost information is taken
from the configuration each time when a query to solr is done.
2012-12-02 16:54:29 +01:00
Michael Peter Christen
b5ee88c6af
added more logging to get info which url causes performance problems
2012-12-02 16:52:12 +01:00
reger
1faa045dc1
fix: prevent regex pattern compile error for blacklist import for path '*' (extend it to '.*')
2012-12-01 22:41:21 +01:00
reger
6cf33f899c
prevent Solr "version conflict" on update by set Solr "_version_" field to 0 (=no version check)
2012-11-28 00:09:53 +01:00
Michael Peter Christen
acd98bebb7
improvements in GSA result writer
2012-11-26 15:18:51 +01:00
Michael Peter Christen
3de784c8dd
replaced more split and replaceAll missing pattern pre-compilation with
...
pre-compiled pattern
2012-11-26 13:40:53 +01:00
Michael Peter Christen
8fc3679c66
using more pre-compile pattern for split methods
2012-11-26 13:11:55 +01:00
Michael Peter Christen
d48e9788d2
enhanced search result processing behavior
...
- query less at one time; query more often
- in between the small queries, evaluate results
- remove fields from search results which are not needed
2012-11-26 12:24:35 +01:00
Michael Peter Christen
bf512e6350
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
2012-11-26 00:14:57 +01:00
reger
469efcdb9d
fix: display and calculate authors and namespace search navigator if configured (otherwise skip overhead)
...
(leave hosts, topics and not in ConfigPortal included filetype, protocoll navigator untouched)
2012-11-25 22:49:26 +01:00
Michael Peter Christen
eca68fa197
added debug code to crawler monitor
2012-11-25 15:43:42 +01:00
Michael Peter Christen
205f8b222b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-11-25 14:41:49 +01:00
orbiter
ee612e8b93
start the local search only if this peer is doing a remote search or
...
when it is doing a local search and the peer is old
2012-11-25 11:58:57 +01:00
Michael Peter Christen
d465773a37
- removed multi-add of documents (no used)
...
- inserted specialized code for size request
2012-11-25 01:34:39 +01:00
Michael Peter Christen
a1a4d9aa94
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java
2012-11-24 22:31:46 +01:00
Michael Peter Christen
b7004043ea
- added a field cache for solr queries which call only for a single
...
value
- fixed a version conflict exception within a solr add request
2012-11-24 22:30:05 +01:00
orbiter
5aa5202adf
fixes for filesystem indexing
2012-11-24 10:27:29 +01:00
Michael Peter Christen
efd2c4622d
added a new fail type attribute for the index to distinguish two
...
separate fail types: network fail and forced exclusion (i.e. by robots
or forwarding rules).
2012-11-23 14:00:30 +01:00
Michael Peter Christen
5e182a566f
- added another enumeration method in kelondro data structure to get a
...
more random access to data for the balancer
- added random access inside the balancer
2012-11-23 13:58:39 +01:00
Michael Peter Christen
4eab3aae60
removed overhead by preventing generation of full search results when
...
only the url is requested
2012-11-23 01:35:28 +01:00
Michael Peter Christen
a114bb23bb
- using edismax in gsa interface
...
- generating less field data for gsa search results
- using a boost query in gsa interface to move double content to the end
of the result list
2012-11-22 13:03:33 +01:00
Michael Peter Christen
d6b82840f8
added a feature to find similarities in documents.
...
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
2012-11-21 18:46:49 +01:00
Michael Peter Christen
f5ca5cea44
- added field options to all solr queries. This can be used to restrict
...
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned
2012-11-19 17:24:34 +01:00
Michael Peter Christen
46be4af5b9
Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'
2012-11-18 22:11:04 +01:00
Michael Peter Christen
832eead998
Merge remote-tracking branch 'regerdev/master'
2012-11-18 22:04:11 +01:00
Michael Peter Christen
952e143580
FINALLY YaCy can now search for full strings using double- or
...
singlequoted strings in the search query line!!!
2012-11-18 16:03:34 +01:00
orbiter
5dfd6359cb
redesign of the QueryParams class: introduced QueryGoal which holds the
...
query string parser. This shall be used to create a proper full-string
matching which is handled then by QueryGoal.
2012-11-18 01:22:41 +01:00
cominch
2bb8f045cc
content control: use up-to-date definitions
2012-11-13 17:32:19 +01:00
Michael Peter Christen
5fd3b93661
added deletion of hosts during crawl start if deleteold option was given
2012-11-13 16:54:28 +01:00
Michael Peter Christen
d64445c3cb
because we have the inurl:<term> - searchmodifier, we don't actually
...
need regular expressions as search attributes. They had now been removed
from the advanced search page while they are still created internally.
The filter is then expressed against solr as regular expression filter
query. If the expression points out a selection of an specific protocol,
host or filetype this is then translated into a facetted query.
2012-11-13 11:45:56 +01:00
cominch
a67ff1c8ac
SMW Import: replaced JSON import routines with stable ones
2012-11-12 11:17:50 +01:00
cominch
d2a94cc55e
refactor package
2012-11-09 16:22:24 +01:00
cominch
05742b4562
remove old SMW importer which was part of the ymarks package
2012-11-09 15:44:59 +01:00
cominch
21df1ad9e0
update and generalization of the SMW import and content control routines
2012-11-09 13:48:40 +01:00
Michael Peter Christen
842faf96a2
fixed media search
2012-11-07 17:27:13 +01:00
Michael Peter Christen
93001586a0
removed warnings, removed too-fast pausing of crawls
2012-11-07 15:37:14 +01:00
Michael Peter Christen
8041742e48
added matching of path to query pattern
2012-11-07 15:06:13 +01:00
Michael Peter Christen
8b1c9cba3d
fixed a problem with non-terminating crawls
2012-11-07 15:05:44 +01:00
Michael Peter Christen
61a1d32356
fix to ftp client
2012-11-07 14:58:28 +01:00
Michael Peter Christen
5105256927
update to search result logging (this was a remaining issue from the
...
solr 4.0.0 migration)
2012-11-07 14:15:27 +01:00
Michael Peter Christen
570e42c4e3
fix for filetype naviagtor
2012-11-07 13:53:29 +01:00
Michael Peter Christen
71ed8e5e07
bugfixes for crawler
2012-11-07 12:52:19 +01:00
Michael Peter Christen
12c0db20e5
fixed npe for surrogate import
2012-11-07 02:46:51 +01:00
Michael Peter Christen
52df6ee369
more logging
2012-11-07 02:04:08 +01:00
Michael Peter Christen
158732af37
automatically delete entries from the crawl profile list if crawl is
...
terminated.
2012-11-07 02:03:44 +01:00
Michael Peter Christen
15d1460b40
added information about the reason of pausing of crawls
2012-11-06 15:21:56 +01:00
Michael Peter Christen
2371ef031c
added solr faceted search support to YaCy search results
...
added solr highlighting / YaCy snippets to YaCy search results
- facets are now much more complete
- facets are computed and searched much faster
- snippet computation is done by solr if solr knows the snippet
2012-11-06 14:32:08 +01:00
Michael Peter Christen
b30a7162fa
added more thread-renaiming for search processes
2012-11-06 12:31:23 +01:00
Michael Peter Christen
900445d8e9
set the thread name during solr queries to the solr query to get better
...
debugging options
2012-11-06 11:48:04 +01:00
Michael Peter Christen
d481abd087
added the visualization of error-urls to host browser
...
- only visible for admins
- a faceted search generates a huge list for all hosts in the host list
- the faceted search algorithms had to be modified for that
- within the browsing of the directory path, the error cause is written
to the url which is presented as error-url
- the errors are also accumulated for directory sums
2012-11-06 00:29:37 +01:00
Michael Peter Christen
a15819fbec
fix for some interface problems
2012-11-05 22:14:52 +01:00
Michael Peter Christen
791e1dcfdf
when a new crawl is started, delete all entries about error-urls for
...
crawl-start domains
2012-11-05 22:14:27 +01:00
Michael Peter Christen
619bf7e875
fixed filetype modified for media types in text search
2012-11-05 18:08:00 +01:00
Michael Peter Christen
97f82994a6
automatically pause the crawler if there is a problem with solr
2012-11-05 16:34:42 +01:00
Michael Peter Christen
8fb370d9f8
renovated the way how search results are count. should be correct now...
2012-11-05 03:19:28 +01:00
Michael Peter Christen
7bec253bb0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-11-04 09:21:58 +01:00
Michael Peter Christen
d88eb657fd
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
2012-11-04 09:21:21 +01:00
orbiter
354ef8000d
- added 'deleteold' option to crawler which causes that documents are
...
deleted which are selected by a crawl filter (host or subpath)
- site crawl used this option be default now
- made option to deleteDomain() concurrency
2012-11-04 02:58:26 +01:00
reger
633fbe9188
Fix Metadata handling
...
- language default on missing lang property to "uk" (fix set to nothing)
- language set to TLD (added call to existing language calculation from TLD)
- coordinate number exception on possible lat/lon content of "NaN,NaN"
adjust Netbeans IDE classpath (for Solr/Lucene 4.0.0 jars)
2012-11-04 02:07:59 +01:00
Michael Peter Christen
75dd706e1b
update to HostBrowser:
...
- time-out after 3 seconds to speed up display (may be incomplete)
- showing also all links from the balancer queue in the host list (after
the '/') and in the result browser view with tag 'loading'
2012-11-02 13:57:43 +01:00
Michael Peter Christen
e2c4c3c7d3
migration to solr 4.0.0
2012-11-02 12:29:48 +01:00
Michael Peter Christen
b764de424a
code cleanup
2012-11-02 10:28:32 +01:00
Michael Peter Christen
9330ad4838
- fixed the delete option in host browser
...
- added a delete method which can be used to delete a full subpath in
solr.
2012-11-02 01:22:31 +01:00
Michael Peter Christen
a63179f3f9
added the MIME attribute for the R tag in GSA search result writer
2012-11-02 00:14:29 +01:00
Michael Peter Christen
1168d09de8
more refactoring - integrated the code of SnippetProcess into
...
SearchEvent
2012-11-01 17:40:06 +01:00
Michael Peter Christen
6629e37685
tried to clean up the search process mess
2012-11-01 17:16:43 +01:00
Michael Peter Christen
c5f67a5d6d
fixed a problem with local search from solr results: now all results
...
from solr are shown (again)
2012-11-01 10:22:22 +01:00
Michael Peter Christen
f8f05ecba7
- added a delete button in host browser to delete a complete subpath
...
- removed storage of default collection name - default is now "user"
- made stacking of crawl start points concurrently
2012-10-31 17:44:45 +01:00
Michael Peter Christen
0716a24737
added more / all new crawl profile fields into crawl profile editor
2012-10-31 15:13:05 +01:00
Michael Peter Christen
4a14122ba7
in case that a crawl profile has a collection assigned, use the
...
collection to show a name in the web interface. This should prevent that
much too long names make the interface unusable.
2012-10-31 14:08:33 +01:00
Michael Peter Christen
0fe8be7981
enhaced data structures for balancer and latency computation which
...
should produce a bit better prognosis about forced waiting times.
2012-10-30 17:30:24 +01:00
Michael Peter Christen
ac9540dfb6
removed options for stopwords which are not used
2012-10-30 12:36:36 +01:00
Michael Peter Christen
ce3fed8882
added the Google Search Appliance (GSA) api interface to the main menu.
...
See:
https://developers.google.com/search-appliance/documentation/68/xml_reference#request_overview
2012-10-30 12:27:22 +01:00
Michael Peter Christen
b2ffd49817
less latency
2012-10-30 12:26:32 +01:00
Michael Peter Christen
0833937c1c
better balancing and duetime-cumputation also for no-delay intranet
...
hosts
2012-10-30 11:28:49 +01:00
Michael Peter Christen
c326aa8f67
disabled writing new entries to crawl stacks to prevent that a domain
...
with many documents block refreshing of the crawl queue
2012-10-29 22:26:52 +01:00
Michael Peter Christen
6905182d41
- fix for number of words log message
...
- adding meta:refresh also to crawler stack
2012-10-29 21:42:31 +01:00
Michael Peter Christen
c25d7bcb80
- added concurrency for robots.txt loading
...
- changed data model for domain counter
2012-10-29 21:08:45 +01:00
Michael Peter Christen
a94c537afc
fixed getSize() which can use the cache size while the crawl is running
2012-10-29 11:56:07 +01:00
Michael Peter Christen
96912c9471
enhancement to solr caching: consider that during a get() the document
...
is not in solr but the cache points out that a commit is needed to get
the document.
2012-10-29 11:35:24 +01:00
Michael Peter Christen
a87811bc38
more auto-commit calls when a search interface is opened, but not when a
...
search is done there to prevent blocking during search-time.
2012-10-29 11:27:13 +01:00
Michael Peter Christen
3d3d654e88
if a network configuration is choosed which does not allow DHT and no
...
P2P communication is in robinson mode) then some menu entries are
disabled which have no use in this mode.
2012-10-29 01:51:19 +01:00
Michael Peter Christen
2d9e577ad0
replaced the custom robots.txt loader by the standard http loader
2012-10-28 22:48:11 +01:00
Michael Peter Christen
799d71bc67
enhanced solr caching:
...
- increased cache size which is needed for longer solr commit time
- speed hacks on cache write code
2012-10-28 20:31:29 +01:00
Michael Peter Christen
a33e2742cb
- removed unnecessary synchronized and deadlock in crawler
...
- removed problem with monitoring object on Balancer.wait
- added missing user agent settings
2012-10-28 19:56:02 +01:00
orbiter
8952153ecf
update to Balancer algorithm:
...
- create a load list from the current list of known hosts
- do not create this list for each Balancer.pop access
- create the list from those hosts which have a zero-waiting time
- select 1/3 from that list which have the most urls waiting
- get hosts from the wainting list in random order
- fixes for some delta-time computations
- always load all urls from hosts which have never been loaded before
2012-10-28 13:24:49 +01:00
orbiter
354f0d9acd
moved static method from ClusteredScoreMap to MapDataMining because it
...
was not used in the ClusteredScoreMap class but only in MapDataMining
2012-10-28 11:29:53 +01:00
reger
722a447b0d
- optimize code of augmented parsing to enhence document tags
...
- commented out augmentedparser.analyse (not function implemented yet)
- adjust init of document title list to always use same list type
2012-10-26 18:50:45 +02:00
Michael Peter Christen
8e1248ffe3
force a commit in advance of a search for the administrator to get most
...
recent results even if commit time is high and an indexing is ongoing.
2012-10-26 15:35:42 +02:00
Michael Peter Christen
3b48c78190
added an option to force a commit to solr.
...
may be used by a search front-end in case that the commitWithinMs time
is too short to get recently indexed documents.
2012-10-26 07:39:07 +02:00
sixcooler
2d972f289a
rise commitWithinMs to default-value from SwitchBoard
...
(result in lower hd-io)
no dots in memory-graph (there are to much of them)
2012-10-26 02:12:45 +02:00
orbiter
8fde1dd3b6
another performance and memory hack to graphics: this makes it possible
...
to produce a 100-Megapixel png network graphic image on my 6 year old
laptop in standard configuration in 10 seconds.
2012-10-25 21:40:27 +02:00
Michael Peter Christen
1baf498d59
- show more lines in online log
...
- reverse order is default now
2012-10-25 18:38:39 +02:00
Michael Peter Christen
55bdafbaf1
more image processing hacks
2012-10-25 18:20:05 +02:00
Michael Peter Christen
f2d0418218
because the new PngEncoder had a problem with the PixelGrabber which is
...
caused by a JRE bug, the PixelGrabber had to be circumvented using an
own frame buffer which can be read without a PixelGrabber. This resulted
in ultra-fast and much less memory-consuming transformation. YaCy images
are now generated really fast!
2012-10-25 17:59:20 +02:00
Michael Peter Christen
d5d64019e5
- added a method for the RasterPlotter to draw arrow endings to lines
...
- replaced the dot in the NetworkGraph with arrows
- enhanced the image drawing speed using pre-computed color values
- added more attention for OOM cases during very large image painting
2012-10-25 16:05:04 +02:00
Michael Peter Christen
85ca07b90e
when a new crawl is started, an equal crawl, if still running, is
...
terminated and the corresponding crawl profile is deleted (this also
clears the crawl queue entries for that crawl profile)
2012-10-25 10:20:55 +02:00
Michael Peter Christen
906e51214a
the web structure image shows the pivot dot in a different color
2012-10-25 10:18:28 +02:00
Michael Peter Christen
b3ffcde0c7
- prepared PngEncoder for concurrency: PixelGrabber.grabPixels is the
...
main time-consuming process. This shall be done in concurrency.
- added concurrent processes to call the PixelGrabber and framework to
do that (queues)
It is now possible to create 4k-Images (3840x2160) i.e. with the Network
Graphics servlet
2012-10-24 02:08:51 +02:00
Michael Peter Christen
e9c6f4ce2e
- new order of data computation: first compute the size of
...
compressed deflater output, then assign an exact-sized byte[] which
makes resizing afterwards superfluous
- after all enhancements all class objects were removed; result is just
one short static method
- made objects final where possible
2012-10-24 00:41:09 +02:00
orbiter
c6a1b21399
added a 9-year old png encoder from David Eisenberg which I rewrote
...
quite a bit to remove all code that handles transparency. With this
highly specialized png writer it is possible to write png images much
faster that with the JRE built-in png writer.
In a second step it can be possible to add concurrency to increase
computation speed further.
2012-10-23 23:27:41 +02:00
orbiter
276dd6452b
removed warnings
2012-10-23 19:08:44 +02:00
Michael Peter Christen
b991685782
Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1
2012-10-23 18:14:58 +02:00
Michael Peter Christen
ea11a1efea
fix for highlighting in gsa search
2012-10-23 18:11:49 +02:00
Michael Peter Christen
9eaede50e7
enhanced web structure images
2012-10-23 18:11:19 +02:00
Michael Peter Christen
b7ac1da6a3
gsa results shall have only one title in metadata and that should be the
...
visible title in the <title>-tag
2012-10-23 18:03:12 +02:00
Michael Peter Christen
ae6feb5610
showing the web structure graph as animation in the crawl monitor
2012-10-23 02:50:26 +02:00
reger
87aab9aa7c
- fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
...
- fix Document.addsubdocuments: sections might be initialized as Arrays.toList which does not provide the used .addAll methode
see e.g. http://kamleshkr.wordpress.com/2010/02/17/inside-java-arrays-aslistt-a/
2012-10-22 22:48:35 +02:00
Michael Peter Christen
39317a6c66
enhanced webstructure image: introduced
...
- multiple hosts can be listed (comma-separated) as host argument
- new 'bf'-attribut (branch factor): the maximum number of edges per
node
- the bf-value is computed automatically
- ordering of nodes when the graphic is drawed: mostly the drawing ends
with an limitation eg. number of nodes. When this happens, it should be
ensured that more 'interesting' nodes are painted in advance. This is
now done by sorting all nodes by the number of links they have in de
distant sub-graph.
2012-10-22 16:23:39 +02:00
sixcooler
47ae7e322e
smaller dhtDispatcher.cloudSize
...
@Orbiter: we talked about this times ago - please revert if I'm wrong
2012-10-21 20:05:28 +02:00
sixcooler
57ddd63888
not hold a expensive cache of references for DHT-out,but but load them
...
on demand
see: http://forum.yacy-websuche.de/viewtopic.php?f=8&t=4530
2012-10-21 20:00:36 +02:00
Michael Peter Christen
ea27d2e5f6
fixed more getSolrFieldName usages
2012-10-18 15:21:05 +02:00
Michael Peter Christen
ce0e5b1e17
- more refactoring / private methods
...
- fix for usage of custom solr field names
2012-10-18 15:09:04 +02:00
Michael Peter Christen
ccc3760a47
Refactoring and redesign of data architecture to make URIMetadataRow
...
superfluous. The target is to make a solr document as the core of YaCy
documents which would cause that many conversions can be removed. On the
way to this target the Equivalence of URIMetadataRow and URIMetadataNode
had to be removed to expose the usage of the old URIMetadataRow data
structure.
This refactoring already removes unneccessary conversions and should
make memory usage during indexing lower.
2012-10-18 14:29:11 +02:00
Michael Peter Christen
b400fc7b4d
fix for file parser problem
2012-10-17 18:06:44 +02:00
Michael Peter Christen
e5b3c172ff
removed hack which translated Solr documents to virtual RWI entries
...
which had been then mixed with remote RWIs. Now these Solr documents are
feeded into the result set as they appear during local and remote
search. That makes the search much faster.
2012-10-17 17:45:41 +02:00
Michael Peter Christen
6017691522
added an exception catch
2012-10-17 13:56:11 +02:00