Discussion:
[Bug-wget] [bug #54596] wget gets a lot of file named "index.html?............" and other strange file names
anonymous
2018-08-30 09:12:34 UTC
Permalink
URL:
<http://savannah.gnu.org/bugs/?54596>

Summary: wget gets a lot of file named
"index.html?............" and other strange file names
Project: GNU Wget
Submitted by: None
Submitted on: Thu 30 Aug 2018 09:12:33 AM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: Gabriel Popescu
Originator Email: ***@gmail.com
Open/Closed: Open
Discussion Lock: Any
Release: 1.12
Operating System: GNU/Linux
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None

_______________________________________________________

Details:

Using wget recursively (wget -r -l 20 ...) to fetch the whole content of a web
site, a lot of files named "index.html?....." appear in the root directory
created by wget to save the site content.
To reproduce such behaviour, try to run
wget -r -l 20 notizie.lottoland.it
and look inside the directory notizie.lottoland.it created under the dir where
you run the wget command: you'll find a single index.html file and a lot of
"index.html?....." files, where the dots are POST parameters like "p=203" and
so on.
There are a lot of files named "wp-login.php?....." too and other files with
such strange names in the underlieing dirs.

EG:

# ls
amp
author
category
come-vincere-a-eurojackpot-leggi-i-nostri-4-suggerimenti
comments
feed
index.html
index.html?p=309
index.html?p=322
index.html?p=330
index.html?p=334
index.html?p=354
index.html?p=433
index.html?p=436
index.html?tm=1535446062
index.html?tm=1535619035
i-numeri-fortunati-alla-lotteria
le-probabilita-di-vincere-alla-lotteria
pago-delle-tasse-sulle-vincite-alle-lotterie
quale-lotteria-conviene-giocare
quale-lotteria-ha-piu-probabilita-di-vincere
quando-e-la-prossima-estrazione
robots.txt
wp-admin
wp-content
wp-includes
wp-json
wp-login.php
wp-login.php?action=lostpassword
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fcome-vincere-a-eurojackpot-leggi-i-nostri-4-suggerimenti%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fi-numeri-fortunati-alla-lotteria%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fle-probabilita-di-vincere-alla-lotteria%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fpago-delle-tasse-sulle-vincite-alle-lotterie%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fquale-lotteria-conviene-giocare%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fquale-lotteria-ha-piu-probabilita-di-vincere%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fquando-e-la-prossima-estrazione%2F
xmlrpc.php
xmlrpc.php?rsd

# wget --version
GNU Wget 1.12 built on linux-gnu.

Linux version 2.6.32-504.el6.x86_64 (***@c6b9.bsys.dev.centos.org) (gcc
version 4.4.7 20120313 (Red Hat 4.4.7-11)




_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?54596>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Tim Ruehsen
2018-08-30 09:36:14 UTC
Permalink
Update of bug #54596 (project wget):

Status: None => Invalid
Open/Closed: Open => Closed

_______________________________________________________

Follow-up Comment #1:

This is expected behavior.

These parameters normally indicate that different content is requested. E.g.
this page here is "https://savannah.gnu.org/bugs/?54596". And changing the
appended number would change what you see.


_______________________________________________________

Reply to this item at:

<http://savannah.gnu.org/bugs/?54596>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Loading...