This kind of problem is what regular expressions were made for:
Pattern findUrl = Pattern.compile("\bhttp.*?\.pdf\b");
Matcher matcher = findUrl.matcher("This is a URL http://www.google.com/MyDoc.pdf which should be used");
while (matcher.find()) {
System.out.println(matcher.group());
}
The regular expression explained:
before the "http" there is a word boundary (i.e. xhttp does not match)
http
the string "http" (be aware that this also matches "https" and "httpsomething")
.*?
any character (.
) any number of times (*
), but try to use the least amount of characters (?
)
.pdf
the literal string ".pdf"
after the ".pdf" there is a word boundary (i.e. .pdfoo does not match)
If you would like to match only http and https, try to use this instead of http
in your string:
https?:
- this matches the string http, then an optional "s" (indicated by the ?
after the s) and then a colon.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…