|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.nutch.util.URLUtil
public class URLUtil
Utility class for URL analysis
Constructor Summary | |
---|---|
URLUtil()
|
Method Summary | |
---|---|
static String |
chooseRepr(String src,
String dst,
boolean temp)
Given two urls (source and destination of the redirect), returns the representative one. |
static String |
getDomainName(String url)
Returns the domain name of the url. |
static String |
getDomainName(URL url)
Returns the domain name of the url. |
static DomainSuffix |
getDomainSuffix(String url)
Returns the DomainSuffix corresponding to the
last public part of the hostname |
static DomainSuffix |
getDomainSuffix(URL url)
Returns the DomainSuffix corresponding to the
last public part of the hostname |
static String[] |
getHostSegments(String url)
Partitions of the hostname of the url by "." |
static String[] |
getHostSegments(URL url)
Partitions of the hostname of the url by "." |
static boolean |
isSameDomainName(String url1,
String url2)
Returns whether the given urls have the same domain name. |
static boolean |
isSameDomainName(URL url1,
URL url2)
Returns whether the given urls have the same domain name. |
static void |
main(String[] args)
For testing |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public URLUtil()
Method Detail |
---|
public static String getDomainName(URL url)
getDomainName(conf, new URL(http://lucene.apache.org/))
apache.org
public static String getDomainName(String url) throws MalformedURLException
getDomainName(conf, new http://lucene.apache.org/)
apache.org
MalformedURLException
public static boolean isSameDomainName(URL url1, URL url2)
isSameDomain(new URL("http://lucene.apache.org")
, new URL("http://people.apache.org/"))
will return true.
public static boolean isSameDomainName(String url1, String url2) throws MalformedURLException
isSameDomain("http://lucene.apache.org"
,"http://people.apache.org/")
will return true.
MalformedURLException
public static DomainSuffix getDomainSuffix(URL url)
DomainSuffix
corresponding to the
last public part of the hostname
public static DomainSuffix getDomainSuffix(String url) throws MalformedURLException
DomainSuffix
corresponding to the
last public part of the hostname
MalformedURLException
public static String[] getHostSegments(URL url)
public static String[] getHostSegments(String url) throws MalformedURLException
MalformedURLException
public static String chooseRepr(String src, String dst, boolean temp)
Implements the algorithm described here:
How does the Yahoo! webcrawler handle redirects?
The algorithm is as follows:
src
- Source url of redirectdst
- Destination url of redirecttemp
- Flag to indicate if redirect is temporary
public static void main(String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |