Commit Graph

34 Commits

Author SHA1 Message Date
Michael Peter Christen
788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
Michael Peter Christen
d1cb4cbc84 enhanced network scanner, is faster and more flexible now
- start more processes
- remove superfluous host name resolution
- better/more flexible subnet ip range calculation
- prefer ipv4 makes better usable ip pre-settings in servlet
- extended servlet by new subnet /20 - option
- redesign of scanner start process in servlet (generalization)
2013-02-02 09:51:43 +01:00
Michael Peter Christen
21fe8339b4 - enhanced generation of url objects
- enhanced computation of link structure graphics
- enhanced collection of data for link structures
2012-10-15 13:17:13 +02:00
Michael Peter Christen
5f0ab25382 removed the option to prevent removal of & parts inside of the
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
8219a445f3 refactoring 2012-09-21 16:46:57 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
orbiter
bbfa497a3c replaced more size() > 0 by !isEmpty() 2012-07-12 11:12:21 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
Michael Peter Christen
0301aba1e9 removed unused method parameters 2012-07-05 10:23:07 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
96aeb127e3 generalized localhost naming.
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
Michael Peter Christen
992dbdf4bb added noload statistic to servlets 2012-01-05 18:33:05 +01:00
Michael Christen
0d1042363c showing /16 and /24 subnet option instead of 'bigrange' in network
scanner
2011-12-06 11:37:05 +01:00
orbiter
5a55397f99 some last-minute performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 11:23:52 +00:00
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
orbiter
8e03b8ee8b better integration of server list in interactive search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 12:25:45 +00:00
orbiter
4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 08:24:54 +00:00
orbiter
d326f1486a added timeout setting to scanner interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 11:30:41 +00:00
orbiter
5c981762c6 added bigrange option for network scan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 09:13:16 +00:00
orbiter
bade61696f speed-up of network port scanner
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 09:03:16 +00:00
orbiter
5b579e21a3 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 06:21:40 +00:00
low012
2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-15 01:03:35 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
- cleaned up (removed special code and documentation for 27c3)
- added remote search functions to be used within cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-03 20:52:54 +00:00
orbiter
b2ed4cfaf8 more small bugfixes and light refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 01:57:05 +00:00
orbiter
903c824c2c - allow only scanned resourced with granted status
- increased time-out when scanning an ip range

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-27 23:11:56 +00:00
orbiter
e38217fe88 small changes to scanner
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-26 23:21:34 +00:00
orbiter
fe46536f6e enhanced network scanner (less name resolving during scanning and no name resolving during search)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-26 16:25:17 +00:00
orbiter
58b59f9bc8 - a collection of bug fixes and some redesign of the Scanner class
- fixed smb crawling
- added smbget to download script generation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 23:37:21 +00:00
orbiter
c288fcf634 redesigned CrawlStartScanner user interface and added more features:
- multiple hosts for environment scans can be given (comma-separated)
- each service (ftp, smb, http, https) for the scan can be selected
- the scan result can be accumulated or refreshed each time a network scan is made
- a scheduler was added to repeat a scan and add all found urls to the indexer automatically

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-16 02:15:20 +00:00
orbiter
358feeeb39 enhanced speed and usability of network scanner servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-14 12:12:13 +00:00
orbiter
99a7fe87f9 - removed old intranet scanner (the generic scanner now completely subsumes the old one)
- added information about granted access
- enhanced servlet design
- added submit-feedback (because it is a long-running task)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-14 01:14:15 +00:00
orbiter
acab6801d9 added new network scanner
- you can scan any ip or host in the internet for services
- this replaces the intranet scanner

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7371 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-13 18:19:37 +00:00