reger
46016fa153
autoupdate fails to download latest release (1.71) due to default release blacklist
...
- removed the default version blacklist regex from init (for future versions)
!!! left existing update blacklist setting untouched !!!
(existing installation wanting autoupdate for 1.71 need to change blacklist in ConfigUpdate_p.html)
- moved old blacklist patch to migration.java
2014-04-13 07:32:32 +02:00
Michael Peter Christen
7c7fbb9818
find depth-matches also for edge targets
2014-04-11 12:27:21 +02:00
Michael Peter Christen
dd12dd392f
introduction of a data structure for HyperlinkEdges which should use
...
less memory as it does no double-storage of source links for each edge
of the graph.
2014-04-11 12:09:33 +02:00
Michael Peter Christen
6ea8bb7348
using MultiProtocolURL for edge data which is faster (hash computation
...
is now much easier) and smaller in size
2014-04-11 10:58:37 +02:00
Michael Peter Christen
b21c208b4d
enhanced hashcode computation for MultiProtocolURL
2014-04-11 10:23:48 +02:00
Michael Peter Christen
ce1d1b2fa0
fix for maximum tag length in parser
2014-04-11 09:56:44 +02:00
Michael Peter Christen
17e0956312
refactoring of SystemLoad calls (only one backend tool)
2014-04-11 09:25:18 +02:00
Michael Peter Christen
a37d067692
refactoring
2014-04-10 23:46:35 +02:00
orbiter
95780eed32
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-04-10 21:40:54 +02:00
Michael Peter Christen
67beef657f
strong redesign of html parser: object recursion is now made using a
...
stack on html tag objects, not using a recursive parse-again method
which may cause bad performance and huge memory allocation. The new
method also produced better parsed image objects with exact anchor text
references.
2014-04-10 18:58:03 +02:00
Michael Peter Christen
6bd8c6f195
fix for wrong status codes of error pages
2014-04-10 09:08:59 +02:00
Michael Peter Christen
9e503b3376
also delete the robots.txt file from the cache when a new crawl is
...
started
2014-04-09 21:59:54 +02:00
orbiter
67501c9dda
Merge branch 'master' of git@gitorious.org:yacy/rc1.git
2014-04-09 19:58:54 +02:00
Michael Peter Christen
1c21b3256d
fix for robots.txt handling: delete old entry before starting a new
...
crawl.
2014-04-09 18:33:48 +02:00
orbiter
c250fac9f4
linkstructure refactoring to get more options for clickdepth analysis
2014-04-09 17:52:51 +02:00
Michael Peter Christen
8068e68474
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-04-09 12:45:15 +02:00
Michael Peter Christen
bd886054cb
new structure and enhancements for link graph computation:
...
- added order option to solr queries to be able to retrieve document
lists in specific order, here: link length
- added HyperlinkEdge class which manages the link structure
- integrated the HyperlinkEdge class into clickdepth computation
- extended the linkstructure.json servlet to show also the clickdepth
and other statistic information
2014-04-09 12:45:04 +02:00
reger
f326a67561
fix: typo in default charset in metadata2solr
...
update pom and NB build to Solr 4.7.1 libs
2014-04-06 22:31:22 +02:00
Michael Peter Christen
df138084c0
do solr optimization independently from memory and load constraints:
...
- not doing an optimization will likely cause a too many files exception
- without optimization performance will be even worse which would
prevent optimization in the future as well (prevent a deadlock
situation)
2014-04-06 11:04:23 +02:00
Michael Peter Christen
ebd44a7080
replaced solr 4.6.1 with solr 4.7.1 and added index migration to
...
lucene_47
2014-04-06 10:45:03 +02:00
Michael Peter Christen
734778c0c8
fixed a time-out problem in the default servlet which is also a logging
...
problem because the error log showed the wrong reason (file not found)
instead the actual reason (time-out).
2014-04-04 15:27:29 +02:00
Michael Peter Christen
466d90ad42
fixed a problem with resource observer; probably coming from uncatched
...
exceptions within the apache library which appear only in concurrency
environments.
2014-04-04 15:26:39 +02:00
Michael Peter Christen
e8ddd415a8
enhanced the new link structure graph
2014-04-04 14:43:54 +02:00
Michael Peter Christen
926d28dd3f
fixed a bug which prevented crawl starts after a network switch
2014-04-04 14:43:35 +02:00
Michael Peter Christen
3ce8eff21b
another fix for inbound/outbound detection
2014-04-04 12:41:59 +02:00
Michael Peter Christen
d4b5c457e4
NPE fix
2014-04-04 12:34:34 +02:00
Michael Peter Christen
36a66b0704
fix for parsing of numeric value in case that boolean values are given
2014-04-04 11:59:51 +02:00
orbiter
41730c8048
better logging in template engine: shows filename of servlets where
...
errors in templates occur
2014-04-04 10:55:46 +02:00
orbiter
3c1274057d
fixed thread dump in case of wrong seeds
2014-04-04 10:54:56 +02:00
orbiter
18f9c40302
moved Edge class out of linkstructure servlet as this does not work on
...
non-eclipse driven environments (all non-dev cases)
2014-04-04 10:54:11 +02:00
orbiter
de95e5e524
reduced search activity corona strength in network image
2014-04-04 10:08:44 +02:00
reger
da413af664
move baseurl after parsing orig source in urlproxyservlet
...
to calculate absolute href links for rewrite from unmodified source.
2014-04-04 03:11:16 +02:00
reger
af6ad20728
fix: remove obsolete ref to yacy.home
...
(use Switchboard instead)
2014-04-04 02:45:04 +02:00
Michael Peter Christen
74ab094587
fix for solr query size; too many documents had been retrieved in case
...
that less than _pagesize_ had been requested.
2014-04-03 13:42:10 +02:00
Michael Peter Christen
c64c10ef00
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-04-03 01:58:06 +02:00
Michael Peter Christen
48fbfa60c1
bugfix to inbound/outbound identification
2014-04-03 01:21:43 +02:00
reger
227c42bc96
eleminate obsolete URIMetaDataRow class
...
by joining it with/into URIMetaDataNode.
2014-04-03 00:35:15 +02:00
Michael Peter Christen
cca851a417
introduced new solr field crawldepth_i which records the crawl depth of
...
a document. This is the upper limit for the clickdepth_i value which may
be shorter in case that the crawler did not take the shortest path to
the document.
2014-04-02 23:37:01 +02:00
orbiter
b1ba764d81
fix for first start options and added german translation for popup texts
2014-04-02 17:10:59 +02:00
orbiter
429a874222
- added COLS field in GSA response (non-gsa standard by customer
...
request)
- updated document link in GSA response writer
2014-04-02 16:05:44 +02:00
Michael Peter Christen
1b9ec9a1c5
- added popover to p2p/stealth mode button to explain the peer mode and
...
privacy issues.
- added popover to first-time use case to explain that specific servlets
are only visible after customization and/or crawl starts
2014-04-02 13:33:43 +02:00
Michael Peter Christen
62a36fa584
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-04-02 03:27:08 +02:00
reger
c9f92abddc
fix: application link count
...
(URIMetadataNode)
2014-04-02 03:21:51 +02:00
Michael Peter Christen
a267c46e1a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-04-02 02:35:58 +02:00
Michael Peter Christen
5b83887da8
npe fix
2014-04-02 02:34:55 +02:00
Michael Peter Christen
63c9fcf3e0
free configuration of postprocessing clickdepth maximum depth and time
2014-04-02 02:34:39 +02:00
Michael Peter Christen
39b641d6cd
added tutorial mode - some menu items will only appear if you 'qualify'
...
for them. Thus, the first-time user will only see four menu items. The
other items will unfold as the user interacts.
2014-04-02 02:33:17 +02:00
sixcooler
f06775850f
fix receiving DHT / parse pultipart
...
+ another close to fix possible resource leak warning
2014-04-02 01:24:15 +02:00
reger
49e76a1c55
make use of detected charset in htmlParser if none is given.
2014-04-01 04:02:34 +02:00
reger
e11504309f
adding a hint to javascript browser short cut on Url-Proxy page (AugmentedBrowsing_p.html)
2014-03-30 05:11:42 +02:00