D Richard Felker III
2004-03-01 05:32:58 UTC
The following code in url.c makes it impossible to request urls that
contain multiple slashes in a row in their query string:
else if (*h == '/')
{
/* Ignore empty path elements. Supporting them well is hard
(where do you save "http://x.com///y.html"?), and they
don't bring any practical gain. Plus, they break our
filesystem-influenced assumptions: allowing them would
make "x/y//../z" simplify to "x/y/z", whereas most people
would expect "x/z". */
++h;
}
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into:
http://foo/bar/redirect.cgi?http:/...
and then the web server of course gives an error. Note that the
problem occurs even if the slashes were url escaped, since wget
unescapes them.
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but not
afterwards.
Rich
contain multiple slashes in a row in their query string:
else if (*h == '/')
{
/* Ignore empty path elements. Supporting them well is hard
(where do you save "http://x.com///y.html"?), and they
don't bring any practical gain. Plus, they break our
filesystem-influenced assumptions: allowing them would
make "x/y//../z" simplify to "x/y/z", whereas most people
would expect "x/z". */
++h;
}
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into:
http://foo/bar/redirect.cgi?http:/...
and then the web server of course gives an error. Note that the
problem occurs even if the slashes were url escaped, since wget
unescapes them.
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but not
afterwards.
Rich