Commit Graph

156 Commits

Author SHA1 Message Date
danielr
bb6a6fc233 fixed 'FileUploadException Stream ended unexpectedly'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-09 22:44:17 +00:00
danielr
31d97f2b9f replaced httpd.parseMultipart() by a 'right' implementation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5040 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-08 01:40:28 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
lotus
ba4091c5b2 proxy sends status code now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-27 13:11:06 +00:00
orbiter
0b52ef3e4b - update of grafics
- check for startsWith("127.") instead of equals("127.0.0.0")

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-25 20:30:37 +00:00
orbiter
11e00a0849 - refactoring of seedURL handling
- additional check for seedURL pointing to localhost: deny such peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-25 18:35:38 +00:00
orbiter
c1d721dd2d fix for attacks on localhost-authorized peers from web pages with links to localhost addresses:
checking of referer in access

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-19 22:17:53 +00:00
orbiter
1127d62b64 some enhancements to the access tracker (less synchronization)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-16 23:33:59 +00:00
orbiter
3bd1db776a implemented switch for admin authorization from localhost:
- access is granted for localhost users to administration pages by default
- the default setting can be changed in the BasicConfig.html page
- if the BasicConfig page was accessed with post and no password was submitted, a random password is generated
- a headless installation MUST give a password upon first call of the configuration page, otherwise they will not be able to access it again
- if no password is given within 10 minutes after start-up, a random password is generated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-15 11:26:43 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
danielr
d7b21bc90c re-added gzip POST for transferRWI/URL (HTTP/1.1 compliant)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4761 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-04 10:53:04 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
fa1090113d - next try to fix the networking problem:
set the maximum transfer size to less than MTU=1500-52: buffer size <= 1448
- some refactoring of transfer methods (naming)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 00:16:04 +00:00
orbiter
7cc4ff05c9 some code enhancements and bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:48:24 +00:00
borg-0300
3445b1e10b *better logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 13:41:54 +00:00
borg-0300
4b0339fec0 *fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=927
*remove some cast
*Properties added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4525 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 13:29:42 +00:00
orbiter
bd63999801 - faster search: using different data structures that avoid multiplr calculations
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-07 22:16:36 +00:00
orbiter
4a80902081 - added ViewProfile as rdf in foaf syntax
- added link to rdf and vCard version on html page
- can be seen on http://localhost:8080/ViewProfile.html?hash=localhash
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 18:21:08 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00
orbiter
df2a7a8ac8 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 18:47:45 +00:00
fuchsi
21b8d1b918 small cosmetic change for static fields in serverCore (special protocol ASCII entities) to improve readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-14 19:17:54 +00:00
orbiter
ca488e03f5 fixed authorization case
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4262 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-10 02:04:48 +00:00
orbiter
e22014dc83 some memory enhancements when generating and displaying ymage objects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4253 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-07 02:15:12 +00:00
low012
383dc815d2 *) fix for commit 4212
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-14 19:14:53 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
orbiter
01e0669264 re-designed some parts of DHT position calculation (effect is the same as before)
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 12:30:23 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
24e25e1141 enhanced SSI server-side support:
- SSIs may now refer to servlets, not only files
- calling a servlet, the servlet/SSI engine is called recursively
- SSIs now work also for non-chunked-encoding supporting clients
This will support the new search page functionality, to show search results
dynamically without using javascript. To test this method, a test page has been added
http://localhost:8080/ssitest.html
..calls dynamicalls 3 servlets, which produce some delays during their execution
please verify that you can see the result step-by-step on your browser
To implement this feature, some refactoring had been taken place, mostly code
had been made static and will execute faster.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-09 21:58:38 +00:00
orbiter
9ca46a8c69 indexing of local (intranet) urls enabled
To do this, one must create a separate YaCy network that has a local URL domain
A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-24 00:46:17 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
26ddf797eb added bmp and ico image format to all parser/viewing methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-15 12:55:41 +00:00
orbiter
154ffd7c2c fix for wrong http connection version and SSIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 15:06:23 +00:00
orbiter
1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 14:37:10 +00:00
allo
465145cb6f revert to insecure, but dau-proof defaults
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3898 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-15 12:56:52 +00:00
allo
7ad11ceaaa security fix for peers without password. allow access only from localhost
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-15 00:03:44 +00:00
orbiter
66ec8b63c1 added a httpd access tracker:
- all requests to the own httdp can now be listed in the access tracker menu
- the search statistics had been renamed to access tracker and extended by this tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-11 14:05:20 +00:00
orbiter
33ad0c8246 added a web structure computation and logging:
- all web page parsing operations will now increase a web structure file
- the file is computed in memory and dumped at shutdown-time to PLASMASB/webStructure.map in readable form (not a database)
- the file can be used externally to analyse the link structure of the crawled pages
- the web structure can also be retrieved using a xml-interface at http://localhost:8080/xml/webstructure.xml
- the short-term purpose is the computation of a link-graph image (before linuxtag!)
- a long-term purpose could be a decentralized computation of the citation rank



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-22 08:13:48 +00:00
karlchenofhell
601fc7d1c5 - added source to J7Zip-modifed.jar and it's license (changelog is still to come)
- moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools
- prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 13:29:12 +00:00
orbiter
d755a8026d - better OOM protection
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 16:15:40 +00:00
orbiter
bf69a721cb more protection against mis-use of YaCyHop interface:
- target must not be at port 80
- target access not more than every 3 seconds
- requester may not access more than every 10 seconds

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 15:25:10 +00:00
orbiter
b4aa195c27 added user-agent check for yacy-hop proxy authentication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 09:53:02 +00:00
orbiter
d25caa07bf redesigned some parts of http authentication
added another access check for peer hops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-05 19:46:50 +00:00
karlchenofhell
2401e748a3 - fixed wrong replacement of POST-parameters in httpd ('<' and '>' are still replaced, don't know why): http://www.yacy-forum.de/viewtopic.php?t=3466
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-03 01:26:05 +00:00
karlchenofhell
e68cdeeeb3 - reverted parseArg(String) to use a byte-array to handle correct UTF-8 parsing
- arguments aren't passed html-escaped to the servlets anymore, bug-fix for http://www.yacy-forum.de/viewtopic.php?p=30573

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3321 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-02 21:20:53 +00:00
orbiter
47ab83a7c0 added flag for YaCyHop - proxy access for all paths that start with /yacy/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3304 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-31 00:09:51 +00:00