Skip to content

Commit 4f7de7b

Browse files
committed
Add /doc/in-last-call/ to crawler, report original page as referrer in
a redirect chain rather than intermediate URL - Legacy-Id: 5633
1 parent 920c0cd commit 4f7de7b

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

ietf/bin/test-crawl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ connection.queries = DontSaveQueries()
2424
MAX_URL_LENGTH = 500
2525
SLOW_THRESHOLD = 1.0
2626

27-
initial = ["/doc/all/"]
27+
initial = ["/doc/all/", "/doc/in-last-call/"]
2828

2929
visited = set()
3030
urls = {} # url -> referrer
@@ -74,7 +74,7 @@ while urls:
7474
if r.status_code in (301, 302):
7575
u = strip_url(r["Location"])
7676
if u not in visited and u not in urls:
77-
urls[u] = url
77+
urls[u] = referrer # referrer is original referrer, not redirected url
7878

7979
elif r.status_code == 200:
8080
ctype = r["Content-Type"]

0 commit comments

Comments
 (0)