Michael Peter Christen
ed39ef2890
changed generation of protocol information
2012-02-01 18:12:59 +01:00
Michael Peter Christen
0b67a0a5d8
added a column index for tables in blob files. This is heavily used
...
during receiving of DHT submissions and when answering remote search
requests. Both events together may have caused IO-deadlocking and this
commit shall fix that.
2012-02-01 15:11:21 +01:00
Michael Peter Christen
2e5cd6a1b2
fixed parser extension deny list generation and usage
2012-02-01 00:15:59 +01:00
Michael Peter Christen
8bee1472c9
there is no noindex, only nofollow in links
2012-01-31 23:46:35 +01:00
Michael Peter Christen
3cd6dcd352
do not add new solr fields as activated fields
2012-01-31 22:21:48 +01:00
Michael Peter Christen
e3bb73c3d6
serialized some database access methods
2012-01-31 21:13:49 +01:00
Michael Peter Christen
7e728867e5
added a synchronization around iterations to prevent IO-deadlocking
...
during concurrent remote search requests
2012-01-31 18:17:25 +01:00
Michael Peter Christen
355ecf330f
reduced target file site to 64mb
2012-01-29 20:35:48 +01:00
Michael Peter Christen
10ae6d94a1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-01-26 18:11:06 +01:00
Michael Peter Christen
2ea585d616
fix for host navigator
2012-01-26 18:10:34 +01:00
Michael Peter Christen
2f6dde92e2
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-01-26 16:45:33 +01:00
Michael Peter Christen
c560a582ac
fix for single-word vocabulary lines
2012-01-26 16:44:30 +01:00
Michael Peter Christen
4c5edab1ec
added option to have exception search result windows
2012-01-26 15:32:30 +01:00
Michael Peter Christen
046d7de95b
Merge remote branch 'reger/master'
2012-01-26 10:47:40 +01:00
reger
a95f645a61
Bugfix class repository.Loaddispatcher fixed download file limit of 10000
...
line 355: final Response response = this.load(request, cachePolicy, 10000, true);
2012-01-26 04:10:44 +01:00
Michael Peter Christen
ef78f22ee1
performance hack
2012-01-25 12:48:48 +01:00
Michael Peter Christen
41536eb4a2
performance hack
2012-01-25 12:28:56 +01:00
Michael Peter Christen
f91487fc50
added delete-button for host navigation
2012-01-25 11:19:18 +01:00
Michael Peter Christen
e8d24fd802
author navigator can be switched off
2012-01-25 11:11:42 +01:00
Michael Peter Christen
558ab7bd4e
made the protocol navigator reversible
2012-01-25 02:54:52 +01:00
Michael Peter Christen
96cb75f1d4
made the filetype navigator be able to deselect the search constraint
2012-01-25 02:50:06 +01:00
Michael Peter Christen
1f4f60654a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/document/parser/pdfParser.java
2012-01-24 20:42:30 +01:00
reger
32104360ce
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by
time out.
2012-01-23 20:58:36 +01:00
Michael Peter Christen
ef5192f8c9
using the generic document parser for crawl starts instead of the html
...
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
2012-01-23 17:27:29 +01:00
Michael Peter Christen
a02fdf8625
better error messages
2012-01-23 00:47:25 +01:00
Michael Peter Christen
eadb58dd87
small enhancements in pdf parser
2012-01-23 00:46:02 +01:00
Michael Peter Christen
c6ba44468e
timeout = 5000 instead 3000
2012-01-23 00:45:32 +01:00
reger
b616de5973
PDFParser - return at least first 3 pages of PDF
...
fix for pdf parsing without returning parsed text due to interruption by time out.
2012-01-21 03:15:12 +01:00
Lotus
c73af39e54
refactoring of tray icon class,
...
now uses Java 6 methods natively
2012-01-18 20:47:09 +01:00
Michael Peter Christen
4eff0e26f1
npe bugfix
2012-01-17 23:39:57 +01:00
low012
8776b84c10
*) small fix to make password change function of reconfigureYACY.sh work
...
again
2012-01-17 20:43:19 +01:00
Michael Peter Christen
1a0b6b3913
get more navigation details to search results
2012-01-17 16:44:30 +01:00
Michael Peter Christen
7f9b6b7a0c
added switches to ConfigParser to accept/deny documents by their
...
extension
2012-01-17 16:43:34 +01:00
Michael Peter Christen
4901cee3cc
suppress auto-tagged subject entries when sending out or receiving
...
metadata from other peers
2012-01-17 02:10:05 +01:00
Michael Peter Christen
83009d86f7
added the vocabulary navigator. It can be very simply tested by
...
switching on the locale dictionaries.
2012-01-17 01:53:08 +01:00
sixcooler
985b78cf89
correct 'avaiable()' to use max of young / eden
2012-01-16 16:59:58 +01:00
sixcooler
4da8746275
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-01-16 01:48:36 +01:00
sixcooler
c9aaa9e00a
respect non-reserved Memory in GenerationMemoryStrategy
...
and enable it again
2012-01-16 01:46:12 +01:00
Michael Peter Christen
37f2d1b3e9
replaced Thread initialization with ExecutorService pool for delete
...
method. This is much faster and produces less blocking when using the
Compressor class which is used by the HTCache. I.e. picture search is
much faster now.
2012-01-16 01:05:30 +01:00
Michael Peter Christen
a58dc4a91f
added autotagging to document condenser:
...
- tags that are automatically generated now enrich the dc:subject
- auto-generated tags have a '$' at the beginning of the tag
- auto-generated tags lead the tag name with a vocabulary name
each tag has the form
$<vocabulary-name>:<tag-printname-space-replaced-by-'_'>
2012-01-15 22:17:57 +01:00
Michael Peter Christen
0d6176804b
emergency disabling of GenerationMemoryStrategy because of non-working
...
available-method
2012-01-15 21:58:18 +01:00
Lotus
411aab02e3
Windows installer now detects reliably whether YaCy runs. A file lock on
...
the yacy.running file has been implemented.
2012-01-15 19:01:05 +01:00
Michael Peter Christen
87f0210480
enriched log output to find NPE in HeapReader
2012-01-15 12:08:46 +01:00
Michael Peter Christen
987b412491
updated solr scheme: generic declaration of solr schemes
2012-01-13 11:25:15 +01:00
Michael Peter Christen
254adea51c
small fixes
2012-01-13 11:24:08 +01:00
Michael Peter Christen
49be60a7c8
WorkflowProcess is forced to make small pauses if shortMemoryStatus is
...
reached.
2012-01-10 03:03:12 +01:00
Michael Peter Christen
b7bb84c0bb
set a limit to CharBuffer object size to fight against bad/too large
...
content
2012-01-10 03:02:17 +01:00
Michael Peter Christen
c602eaaf46
enhanced search process
2012-01-10 03:00:55 +01:00
Michael Peter Christen
087f97d4c0
less noise if a browser cannot be opened
2012-01-09 20:54:14 +01:00
Michael Christen
eff966f396
fix for search process (it was aborted too early during remote search)
2012-01-09 03:02:35 +01:00
Michael Christen
e6d51363ee
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-01-09 02:00:09 +01:00
Marek Otahal
a231d0eeb9
Run from Java the whole app YACY
...
start for java webStart
allow for better integration with IDE
Conflicts:
source/net/yacy/gui/framework/Browser.java
2012-01-09 01:49:37 +01:00
Marek Otahal
72adbeae90
!Important: move from Hashtable to HashMap
...
Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better
functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue,
but I found notices that some (ugly big) helper classes had to be created in past
to compensate missing Hashtable's functionality. I'd like input if we can remove some of them.
look for //FIX: if these commits
Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:29:18 +01:00
Marek Otahal
f40efb39af
Blacklist loadList() remove duplicates by using Set
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:18:01 +01:00
Marek Otahal
f75b5e40e0
little fix in copy()
...
Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:16:46 +01:00
Marek Otahal
1dc5d9f0f3
make ConnectionInfo comparable and sort list of connections in Connections_p
...
ConnectionInfo compare by initTime
Connections_p implement wish to sort connections, descending
Signed-off-by: Marek Otahal <markotahal@gmail.com>
2012-01-09 01:14:41 +01:00
Michael Christen
fa8da7f89d
vocabularies are now also used as source for a did-you-mean computation
2012-01-08 02:13:52 +01:00
Michael Christen
eaec14ecc4
Dictionaries from words caches can now be used as autotagging vocabulary
2012-01-08 02:07:10 +01:00
Michael Peter Christen
91940fdf56
redesign of WordCache to be prepared to hold multiple
...
independent dictionaries. Such dictionaries can then be also used as
simplified vocabularies.
2012-01-08 00:47:32 +01:00
Michael Christen
bd40a10230
added autotaggig stub .. only reading and parsing of vocabularies at
...
this time
2012-01-07 17:34:38 +01:00
Michael Peter Christen
2ee8cbeb2c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
...
Conflicts:
source/net/yacy/search/Switchboard.java
2012-01-05 18:37:46 +01:00
Michael Peter Christen
992dbdf4bb
added noload statistic to servlets
2012-01-05 18:33:05 +01:00
Michael Christen
eebc02f5c1
fix
2012-01-04 20:24:48 +01:00
Michael Christen
216a287a85
Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r
...
Conflicts:
source/de/anomic/crawler/CrawlQueues.java
2012-01-04 20:16:37 +01:00
stbrumm
d18095dc48
Patch fuer Issue 0000102
...
and fixes to Patch (private peer status is a property of a peer, not a
status)
2012-01-03 17:49:37 +01:00
stbrumm
9f1b1b4604
Type for Robinson-Mode/Private Perr added
2012-01-03 14:43:17 +01:00
Michael Christen
20962a4ed7
added metadata node stub for metadata from blobs
2012-01-03 14:38:03 +01:00
Michael Christen
575dbbaa93
enhancements in Blob retrieval: try to use less CPU resources by testing
...
a blog first that most certainly has wanted entries.
2012-01-02 02:14:05 +01:00
Michael Christen
585a8f3c44
fixed a bug in search sequence (caused emtpy results)
2012-01-02 02:10:39 +01:00
Michael Christen
361146dd7a
better error handling for file loader
2011-12-29 14:37:19 +01:00
Roland 'Quix0r' Haeder
6d4e08ed06
Rewrote filesize() to (hopefully) avoid a NPE, rewrote Blacklist class to concurrent classes to avoid a CME
2011-12-29 03:42:38 +01:00
Roland 'Quix0r' Haeder
fa08ed5ae5
Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
2011-12-29 00:33:16 +01:00
Roland Haeder
319fd1f4aa
A concurrent access can happen on the blacklist (with latest introduced blacklist check in media snippet computation)
2011-12-28 21:40:44 +01:00
Roland 'Quix0r' Haeder
a3083d13bf
Blacklist checks are now always turned on, in media searches (e.g. image search) images matching blacklist entries are no longer shown to the user
2011-12-28 20:09:17 +01:00
Michael Christen
52184a1170
fix for search process
2011-12-27 23:43:44 +01:00
Michael Christen
85bd4cc8bc
better lookup for peer names
2011-12-25 10:14:15 +01:00
Michael Christen
20e3084bd4
redesign of fining of peers by ip: more leightweight method to read the
...
seed databases
2011-12-21 01:14:43 +01:00
Michael Christen
0797b0de99
new handling of remote search processes: looking for seeds will now not
...
block the whole search process any more. A deadlock with a DHT selection
process may have been the cause for interface lockings in the past.
2011-12-21 00:32:03 +01:00
Michael Christen
ee9aae5cc0
more about CreativeCommons license vocabulary
2011-12-18 16:07:51 +01:00
Michael Christen
ecd74fe34f
less dramatic upnp failures
2011-12-18 09:54:08 +01:00
Michael Christen
c75e1a3125
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2011-12-18 08:20:10 +01:00
Michael Christen
13f5b5f80d
the component part in the YaCy Metadata is filled using the Dubling Core
...
vocabulary
2011-12-18 08:19:48 +01:00
Michael Peter Christen
8d2cbfb685
more vocabularies and more semantics for lod data structures
2011-12-18 08:12:34 +01:00
Michael Christen
9cd36b4c44
added vocabulary for geolocalization as used in georss
2011-12-17 02:03:45 +01:00
Michael Christen
9e5894c784
Removed handling of components objects for URIMetadataRows.
...
This is a preparation to replace this rows with nodes from the node
store.
2011-12-17 01:27:08 +01:00
Michael Christen
66ab51f89d
added rdf vocabulary
2011-12-17 01:09:16 +01:00
Michael Christen
c04bfaa51b
refactoring
2011-12-16 23:59:29 +01:00
Michael Peter Christen
136b514f52
added a Triple Store based on Nodes that fit to the new storage classes.
...
Added also a first Vocabulary for the node store - Dublin Core.
2011-12-16 23:01:47 +01:00
Michael Peter Christen
613ab6a69d
added BEncodedHeapBag and BEncodedHeapShard which are storage container
...
for a new metadata store. An abstraction of the content for this storage
is defined with MapStore. A MapStore is an abstraction of a RDF Node
store.
2011-12-16 23:00:50 +01:00
Michael Christen
6fecd0db88
one more performance hack to prevent costly md5 computation
2011-12-15 23:33:41 +01:00
Michael Christen
e13441b069
better digest pool size (smaller by default but unlimited)
2011-12-15 17:45:46 +01:00
Michael Christen
1f4afb4dc0
performance hacks
2011-12-15 15:15:53 +01:00
Michael Christen
675d557e88
removed debug logging
2011-12-14 22:21:19 +01:00
Michael Christen
e9dc99fe15
added rules to set specific RWIs as private RWIs which are not
...
transmitted to remote peers. This will be used for private index copies
and phonetic indexes.
2011-12-14 22:15:51 +01:00
Michael Peter Christen
4243ace863
added phonetic classes
2011-12-14 17:33:18 +01:00
Michael Peter Christen
0bcef2d156
added feature as requested in
...
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
2011-12-13 00:16:05 +01:00
Michael Christen
204c29f010
small bugfixes for search result display and cache display
2011-12-10 01:35:38 +01:00
Michael Christen
17f962fceb
translator updates:
...
- config string for chinese
- do not copy the language file to DATA/LOCALE any more (and do not use
them there, this is really confusing for new translators)
2011-12-08 10:25:26 +01:00
Michael Christen
078fcde0dd
bad initialization
2011-12-07 01:02:23 +01:00
Michael Christen
14e45e90fd
patch for a bug that I don't understand by now.
2011-12-07 00:52:04 +01:00