Commit Graph

51 Commits

Author SHA1 Message Date
Michael Peter Christen
5e31bad711 - the webgraph shall store all links which appear on a web page and not
all unique links! This made it necessary, that a large portion of the
parser and link processing classes must be adopted to carry a different
type of link collection which carry a property attribute which are
attached to web anchors.
- introduction of a new URL class, AnchorURL
- the other url classes, DigestURI and MultiProtocolURI had been renamed
and refactored to fit into a new document package schema, document.id
- cleanup of net.yacy.cora.document package and refactoring
2013-09-15 00:30:23 +02:00
Michael Peter Christen
765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
in intranets and the internet can now choose to appear as Googlebot.
This is an essential necessity to be able to compete in the field of
commercial search appliances, since most web pages are these days
optimized only for Google and no other search platform any more. All
commercial search engine providers have a built-in fake-Google User
Agent to be able to get the same search index as Google can do. Without
the resistance against obeying to robots.txt in this case, no
competition is possible any more. YaCy will always obey the robots.txt
when it is used for crawling the web in a peer-to-peer network, but to
establish a Search Appliance (like a Google Search Appliance, GSA) it is
necessary to be able to behave exactly like a Google crawler.
With this change, you will be able to switch the user agent when portal
or intranet mode is selected on per-crawl-start basis. Every crawl start
can have a different user agent.
2013-08-22 14:23:47 +02:00
Michael Peter Christen
336f86394c replaced StringBuffer with StringBuilder 2013-07-23 12:21:27 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Michael Peter Christen
5878c1d599 - refactoring of log to ConcurrentLog:
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
2013-07-09 14:28:25 +02:00
cominch
d2a94cc55e refactor package 2012-11-09 16:22:24 +01:00
cominch
05742b4562 remove old SMW importer which was part of the ymarks package 2012-11-09 15:44:59 +01:00
cominch
21df1ad9e0 update and generalization of the SMW import and content control routines 2012-11-09 13:48:40 +01:00
Michael Peter Christen
a33e2742cb - removed unnecessary synchronized and deadlock in crawler
- removed problem with monitoring object on Balancer.wait
- added missing user agent settings
2012-10-28 19:56:02 +01:00
Michael Peter Christen
5f0ab25382 removed the option to prevent removal of & parts inside of the
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
cominch
23204d2245 change parameter to support the smw extension for list import 2012-09-20 15:02:57 +02:00
Michael Peter Christen
8c099d2106 Merge remote-tracking branch 'origin/master'
Conflicts:
	htroot/api/ymarks/import_ymark.java
	source/de/anomic/data/ymark/YMarkEntry.java
	source/de/anomic/data/ymark/YMarkTables.java
2012-09-10 07:05:20 +02:00
Michael Peter Christen
a427a68bac removed many warnings 2012-08-31 14:07:33 +02:00
cominch
dc468dad01 add content control features for custom filter lists 2012-08-29 09:04:28 +02:00
Michael Peter Christen
f00733186b code simplifications 2012-08-19 13:17:03 +02:00
cominch
e2119f4e76 augmented browsing: replace htmlparser by jsoup, which is more stable
and reliable
2012-08-14 10:06:12 +02:00
Michael Peter Christen
1687737771 Abstraction of HandleMap and HandleSet 2012-07-27 12:13:53 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00
Michael Peter Christen
0301aba1e9 removed unused method parameters 2012-07-05 10:23:07 +02:00
Michael Peter Christen
ea10766bfd cleaned unnecessary nested code 2012-07-05 08:44:39 +02:00
Michael Peter Christen
96aeb127e3 generalized localhost naming.
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
cominch
c63c3a4495 Show additional interaction elements in footer section on each page, if
activated in ConfigPortal.html.
This footer is also visible in augmented browsing proxy mode.
2012-06-20 18:04:23 +02:00
cominch
e4555cbee3 Augmented browsing: Pass on additional action parameter 2012-06-18 15:44:01 +02:00
Michael Peter Christen
b2d1c25ebb removed warnings/unused entities 2012-06-17 11:22:08 +02:00
cominch
2ac7a5c1f2 Augmented browsing: Add overlay bar which shows the vocabulary tags 2012-06-15 14:32:16 +02:00
cominch
f49d92d8da Cleanup of interaction class and helper routines 2012-06-14 17:41:45 +02:00
Michael Peter Christen
5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
2012-06-11 16:48:53 +02:00
cominch
8d2e6355f8 augmented browsing: remove non-existing external snippet file 2012-06-11 11:40:48 +02:00
cominch
c90f174799 preparation and generalization of augmented browsing methods 2012-06-11 09:23:44 +02:00
Michael Peter Christen
ca93835713 removed usage of deprecated methods 2012-06-10 23:17:21 +02:00
Michael Peter Christen
90c6fc4b63 load all - but not the persistent local.rdf - triples from
DATA/TRIPLESTORE at startup time. The local.rdf is loaded only if the
persistent switch is on (as before).
2012-06-10 21:49:02 +02:00
cominch
aa0295917c augmentation
Conflicts:
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 13:10:21 +02:00
cominch
ed2ea0f08e augmented browsing modification
Conflicts:
	htroot/interaction/OverlayInteraction.html
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 13:07:57 +02:00
cominch
6b32f7c1f6 re-enable augmented proxy 2012-06-10 13:04:13 +02:00
cominch
3b08edec2e bugfix
Conflicts:
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 13:03:56 +02:00
cominch
5f8ba7f4f2 small changes
Conflicts:
	source/net/yacy/document/parser/augment/AugmentParser.java
	source/net/yacy/interaction/Interaction.java
2012-06-10 13:02:00 +02:00
cominch
300b235ce8 Updated Demo Servlet
Conflicts:
	htroot/About.html
	htroot/DemoServlet.html
	htroot/DemoServlet.java
	htroot/interaction/interaction.js
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:58:29 +02:00
cominch
df47f31235 interaction: add special table interaction
Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:43:01 +02:00
cominch
e14f2881ae interaction: add special table interaction
Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:41:16 +02:00
cominch
d7326079a8 interaction: add global variable store
Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:39:27 +02:00
cominch
4e4e7a99f8 interaction: add global variable store
Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:34:36 +02:00
cominch
bde07ed7a8 Add tagging overlay element
Conflicts:
	htroot/env/templates/jqueryheader.template
	htroot/yacysearchitem.java
	source/net/yacy/interaction/Interaction.java
2012-06-10 12:28:50 +02:00
cominch
b0bc0b4572 Add new demonstration module for client-side key-value store (backend:
triplestore): /DemoServletInteraction.html

Conflicts:
	source/net/yacy/interaction/Interaction.java
2012-06-10 10:53:30 +02:00
cominch
c9dc6cda02 Demonstration: include value from interaction in search results
Conflicts:
	htroot/interaction/OverlayInteraction.html
	htroot/yacysearchitem.java
2012-06-10 10:51:53 +02:00
cominch
9ef5a80f4e add interaction for triples and selector for augmented browsing
Conflicts:
	htroot/interaction/interaction.js
	source/net/yacy/interaction/Interaction.java
2012-06-10 10:38:54 +02:00
cominch
282c1620d6 Allow TripleStore to be persistent after reboot 2012-06-10 10:36:16 +02:00
cominch
5d20cd324a Add Triplestore and RDF query interface
Conflicts:
	build.xml
	defaults/yacy.init
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 10:35:59 +02:00
cominch
bc9a618e0a augmented browsing: ignore js and css, integrate more user interaction
Conflicts:
	htroot/interaction/Footer.html
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 10:29:15 +02:00
cominch
b21048892b augmentedParser add features and integrate external html parser to
modify existing web pages

Conflicts:
	addon/YaCy.app/Contents/Info.plist
	build.xml
2012-06-10 10:23:35 +02:00