Skip to content

Commit 7ee6bd4

Browse files
committed
When doing test-crawling, ignore variations of the 'next=' query arg. (The code ignores other query args if 'next' is given).
- Legacy-Id: 18730
1 parent 9a8c6ae commit 7ee6bd4

1 file changed

Lines changed: 3 additions & 0 deletions

File tree

bin/test-crawl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,9 @@ def strip_url(url):
8383
fragment_url = re.search("^(.+)#[a-z_.-]+$", url)
8484
if fragment_url:
8585
url = fragment_url.group(1)
86+
next_url = re.search(r"^(.+)\?next=.+$", url)
87+
if next_url:
88+
url = next_url.group(1)
8689
return url
8790

8891
def extract_html_urls(content):

0 commit comments

Comments
 (0)