Discussion:
Bug in wget: cannot request urls with double-slash in the query string
D Richard Felker III
2004-03-01 05:32:58 UTC
Permalink
The following code in url.c makes it impossible to request urls that
contain multiple slashes in a row in their query string:

else if (*h == '/')
{
/* Ignore empty path elements. Supporting them well is hard
(where do you save "http://x.com///y.html"?), and they
don't bring any practical gain. Plus, they break our
filesystem-influenced assumptions: allowing them would
make "x/y//../z" simplify to "x/y/z", whereas most people
would expect "x/z". */
++h;
}

Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into:

http://foo/bar/redirect.cgi?http:/...

and then the web server of course gives an error. Note that the
problem occurs even if the slashes were url escaped, since wget
unescapes them.

Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but not
afterwards.

Rich
Hrvoje Niksic
2004-03-01 14:36:55 UTC
Permalink
Post by D Richard Felker III
The following code in url.c makes it impossible to request urls that
[...]

That code is removed in CVS, so multiple slashes now work correctly.
Post by D Richard Felker III
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into: [...]
Which version of Wget are you using? I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.
Post by D Richard Felker III
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but
not afterwards.
That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.
D Richard Felker III
2004-03-01 15:42:48 UTC
Permalink
Post by Hrvoje Niksic
Post by D Richard Felker III
The following code in url.c makes it impossible to request urls that
[...]
That code is removed in CVS, so multiple slashes now work correctly.
Post by D Richard Felker III
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into: [...]
Which version of Wget are you using? I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.
I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and
it persisted.
Post by Hrvoje Niksic
Post by D Richard Felker III
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but
not afterwards.
That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.
If you'd still like details now that you know the version I was using,
let me know and I'll be happy to do some tests.

Rich
Hrvoje Niksic
2004-03-01 18:25:52 UTC
Permalink
Post by D Richard Felker III
Post by Hrvoje Niksic
Post by D Richard Felker III
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into: [...]
Which version of Wget are you using? I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.
I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1
and it persisted.
OK.
Post by D Richard Felker III
Post by Hrvoje Niksic
Post by D Richard Felker III
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but
not afterwards.
That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.
If you'd still like details now that you know the version I was
using, let me know and I'll be happy to do some tests.
Yes please. For example, this is how it works for me:

$ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com"
DEBUG output created by Wget 1.8.2 on linux-gnu.

--19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com
=> `something?redirect=http:%2F%2Fwww.cnn.com'
Resolving www.xemacs.org... done.
Caching www.xemacs.org => 199.184.165.136
Connecting to www.xemacs.org[199.184.165.136]:80... connected.
Created socket 3.
Releasing 0x8080b40 (new refcount 1).
---request begin---
GET /something?redirect=http://www.cnn.com HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.xemacs.org
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
...

The request log shows that the slashes are apparently respected.
D Richard Felker III
2004-03-05 05:41:14 UTC
Permalink
Post by Hrvoje Niksic
Post by D Richard Felker III
Post by Hrvoje Niksic
Post by D Richard Felker III
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but
not afterwards.
That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.
If you'd still like details now that you know the version I was
using, let me know and I'll be happy to do some tests.
$ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com"
DEBUG output created by Wget 1.8.2 on linux-gnu.
--19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com
=> `something?redirect=http:%2F%2Fwww.cnn.com'
Resolving www.xemacs.org... done.
Caching www.xemacs.org => 199.184.165.136
Connecting to www.xemacs.org[199.184.165.136]:80... connected.
Created socket 3.
Releasing 0x8080b40 (new refcount 1).
---request begin---
GET /something?redirect=http://www.cnn.com HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.xemacs.org
Accept: */*
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
...
The request log shows that the slashes are apparently respected.
I retried a test case and found the same thing -- the slashes were
respected. Then I remembered that I was using -i. Wget seems to work
fine with the url on the command line; the bug only happens when the
url is passed in with:

cat <<EOF | wget -i -
http://...
EOF

Using this method is necessary since it is the ONLY secure way I know
of to do a password-protected http request from a shell script.
Otherwise the password appears on the command line...

Rich
Hrvoje Niksic
2004-03-05 10:41:00 UTC
Permalink
Post by D Richard Felker III
Post by Hrvoje Niksic
The request log shows that the slashes are apparently respected.
I retried a test case and found the same thing -- the slashes were
respected.
OK.
Post by D Richard Felker III
Then I remembered that I was using -i. Wget seems to work fine with
the url on the command line; the bug only happens when the url is
cat <<EOF | wget -i -
http://...
EOF
But I cannot repeat that, either. As long as the consecutive slashes
are in the query string, they're not stripped.
Post by D Richard Felker III
Using this method is necessary since it is the ONLY secure way I
know of to do a password-protected http request from a shell script.
Yes, that is the best way to do it.

Loading...