Commit Graph

8 Commits

Author SHA1 Message Date
orbiter
d1d9fbae5c enabling the URLAnalysis to operate on multime input files, just use a wild card when calling the class from the command line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-26 23:47:41 +00:00
orbiter
7ea53fe47b added another url list transformation option:
- check the list and kick out entries with lines that contain not valid urls
- normalize the urls
- remove doubles
- sort the list
- split the list in smaller chunks
This is all done in one process which can be called with a new -sort option

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5655 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-26 21:51:23 +00:00
orbiter
54625360f7 performance update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5653 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-25 23:27:21 +00:00
orbiter
d884c4718a added gzip support for URLAnalysis:
url lists can also be compressed with gzip
If such a file is handed over to URLAnalysis, the output will also be written as .gz-file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5652 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-25 13:40:51 +00:00
orbiter
cf9b74e6e3 added another method to process url lists: extract hosts only
This can be used like
java -Xmx2000m -cp classes de.anomic.data.URLAnalysis -host DATA/EXPORT/20090224213823.txt

changed als the call method to generate statistics, please use now
java -Xmx2000m -cp classes de.anomic.data.URLAnalysis -stat DATA/EXPORT/20090224213823.txt


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5650 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-24 22:51:07 +00:00
orbiter
89d8e824ed memory protection for URLAnalysis
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-24 22:05:09 +00:00
orbiter
0f6fa804ff performance update to URLAnalysis
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5648 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-24 21:35:33 +00:00
orbiter
e8f5f2f612 added tool to analyse url strings
and to generate statistics about words occurring in urls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5646 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-24 10:00:35 +00:00