Bug in wget: cannot request urls with double-slash in the query string

Post by D Richard Felker III
The following code in url.c makes it impossible to request urls that

[...]

That code is removed in CVS, so multiple slashes now work correctly.

Post by D Richard Felker III
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into: [...]

Which version of Wget are you using? I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.

Post by D Richard Felker III
Removing the offending code fixes the problem, but I'm not sure if
this is the correct solution. I expect it would be more correct to
remove multiple slashes only before the first occurrance of ?, but
not afterwards.

That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.

D Richard Felker III

2004-03-01 15:42:48 UTC

Post by D Richard Felker III
The following code in url.c makes it impossible to request urls that

[...]
That code is removed in CVS, so multiple slashes now work correctly.

Post by D Richard Felker III
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into: [...]

Which version of Wget are you using? I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.

I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1 and
it persisted.

That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.

If you'd still like details now that you know the version I was using,
let me know and I'll be happy to do some tests.

Rich

Hrvoje Niksic

2004-03-01 18:25:52 UTC

Post by D Richard Felker III
Think of something like http://foo/bar/redirect.cgi?http://...
wget translates this into: [...]

Which version of Wget are you using? I think even Wget 1.8.2 didn't
collapse multiple slashes in query strings, only in paths.

I was using 1.8.2 and noticed the problem, so I upgraded to 1.9.1
and it persisted.

OK.

That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.

If you'd still like details now that you know the version I was
using, let me know and I'll be happy to do some tests.

Yes please. For example, this is how it works for me:

$ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com"
DEBUG output created by Wget 1.8.2 on linux-gnu.

--19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com
=> `something?redirect=http:%2F%2Fwww.cnn.com'
Resolving www.xemacs.org... done.
Caching www.xemacs.org => 199.184.165.136
Connecting to www.xemacs.org[199.184.165.136]:80... connected.
Created socket 3.
Releasing 0x8080b40 (new refcount 1).
---request begin---
GET /something?redirect=http://www.cnn.com HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.xemacs.org
Accept: */*
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response...
...

The request log shows that the slashes are apparently respected.

D Richard Felker III

2004-03-05 05:41:14 UTC

That's exactly what should happen. Please give us more details, if
possible accompanied by `-d' output.

If you'd still like details now that you know the version I was
using, let me know and I'll be happy to do some tests.

$ /usr/bin/wget -d "http://www.xemacs.org/something?redirect=http://www.cnn.com"
DEBUG output created by Wget 1.8.2 on linux-gnu.
--19:23:02-- http://www.xemacs.org/something?redirect=http://www.cnn.com
=> `something?redirect=http:%2F%2Fwww.cnn.com'
Resolving www.xemacs.org... done.
Caching www.xemacs.org => 199.184.165.136
Connecting to www.xemacs.org[199.184.165.136]:80... connected.
Created socket 3.
Releasing 0x8080b40 (new refcount 1).
---request begin---
GET /something?redirect=http://www.cnn.com HTTP/1.0
User-Agent: Wget/1.8.2
Host: www.xemacs.org
Accept: */*
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
...
The request log shows that the slashes are apparently respected.

I retried a test case and found the same thing -- the slashes were
respected. Then I remembered that I was using -i. Wget seems to work
fine with the url on the command line; the bug only happens when the
url is passed in with:

cat <<EOF | wget -i -
http://...
EOF

Using this method is necessary since it is the ONLY secure way I know
of to do a password-protected http request from a shell script.
Otherwise the password appears on the command line...

Rich

Hrvoje Niksic

2004-03-05 10:41:00 UTC