Discussion:
[Bug-wget] Filename should end with the first '?' character
m***@cyber-dome.com
2018-11-10 21:19:46 UTC
Permalink
Hello all,

WordPress has 'invented' a way to avoid caching of static content and force
downloading it every time.
It does so by adding parameters to the file requested. This "feature" is
slowing the page download and create an issue using wget.

For example, a stylesheet is appended with "ver" parameter.
https://condo-farm.com/wp-content/themes/DazChild/style.css?ver=4.8.7

In wget1 the '?' character was replaced with %3F string (Hex value on the
character?) and it worked somehow.
https://condo-farm.com/wp-content/themes/DazChild/style.css%3Fver=4.8.7.css

The file name generated was:
style.css%3Fver=4.8.7.css and it allowed the website served using HTTP
server.

In wget2 the character is not replaced and the generated filename is
style.css\?ver\=4.8.7

This file can't be served using HTTP server as it strips the parameters from
the filename.

I suggest to strip the parameter string from the filename and save it as
"style.css".
More then that: if the file refers to static content (html,js,css...) I
suggest stripping the parameter string also in the referring links.

What do you think?

Michael
Tim Rühsen
2018-11-12 12:13:58 UTC
Permalink
Hi Michael,

we have this in Wget2 already

From the docs:

### `--cut-url-get-vars`

Remove HTTP GET Variables from URLs.
For example "main.css?v=123" will be changed to "main.css".
Be aware that this may have unintended side effects, for example
"image.php?name=sun" will be changed
to "image.php". The cutting happens before adding the URL to the
download queue.


### `--cut-file-get-vars`

Remove HTTP GET Variables from filenames.
For example "main.css?v=123" will be changed to "main.css".

Be aware that this may have unintended side effects, for example
"image.php?name=sun" will be changed
to "image.php". The cutting happens when saving the file, after
downloading.

File names obtained from a "Content-Disposition" header are not
affected by this setting (see --content-disposition),
and can be a solution for this problem.

When "--trust-server-names" is used, the redirection URL is affected
by this setting.


Regards, Tim
Post by m***@cyber-dome.com
Hello all,
WordPress has 'invented' a way to avoid caching of static content and force
downloading it every time.
It does so by adding parameters to the file requested. This "feature" is
slowing the page download and create an issue using wget.
For example, a stylesheet is appended with "ver" parameter.
https://condo-farm.com/wp-content/themes/DazChild/style.css?ver=4.8.7
In wget1 the '?' character was replaced with %3F string (Hex value on the
character?) and it worked somehow.
https://condo-farm.com/wp-content/themes/DazChild/style.css%3Fver=4.8.7.css
style.css%3Fver=4.8.7.css and it allowed the website served using HTTP
server.
In wget2 the character is not replaced and the generated filename is
style.css\?ver\=4.8.7
This file can't be served using HTTP server as it strips the parameters from
the filename.
I suggest to strip the parameter string from the filename and save it as
"style.css".
More then that: if the file refers to static content (html,js,css...) I
suggest stripping the parameter string also in the referring links.
What do you think?
Michael
Michael
2018-11-12 17:05:45 UTC
Permalink
Hi Tim and all,

Thank you for your reply.

The `--cut-file-get-vars' works fine. This should be the default behavior, or people from around the world will curse us for wasting several hours of their time on that.

I did not notice effect of `--cut-url-get-vars` effect.


Which brings me to my first 'assignment':

"Be aware that this may have unintended side effects, for example
"image.php?name=sun" will be changed
to "image.php". The cutting happens before adding the URL to the
download queue."

In the case that "GET image.php?name=sun" return type image/jpeg, the result should be downloaded as image_sun.jpg and the proper reference to it should be made in the html code.

My initial approach to the wget project was to avoid creating directory to every directory GET in WordPress. When the content type returned is html, A filename directory.html should be created and proper reference should be in the html menu code.

What do you think?

Michael

Loading...