Skip to content

Commit 5dcd140

Browse files
committed
Improve test-crawler regexp so it can catch and visit linked feed URLs
- Legacy-Id: 7104
1 parent b4dfae1 commit 5dcd140

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

ietf/bin/test-crawl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def strip_url(url):
3333
return url
3434

3535
def extract_html_urls(content):
36-
for m in re.finditer(r'<a.*href="([^"]+)">', content):
36+
for m in re.finditer(r'<(?:a|link) [^>]*href="([^"]+)"', content):
3737
url = strip_url(m.group(1))
3838
if len(url) > MAX_URL_LENGTH:
3939
continue # avoid infinite GET parameter appendages

0 commit comments

Comments
 (0)