Discussion:
[Bug-wget] Timestamping vs incomplete downloads
Dave Warren
2018-10-22 23:49:12 UTC
Permalink
Currently when a download with timestamping enabled gets interrupted,
the timestamp of the resulting file ends up being the current time and
when wget is re-executed after connectivity is restored the local file
is then seen as newer and skipped.

robocopy handles this a little differently, by setting a date far in the
past as a way of ensuring that on a subsequent execution the transfer
can be resumed.

Is there a better way to handle this situation in wget? A way to force
an old date on the file? I'd be happy with a fixed "in the past" date,
the service supplied date minus a second, etc. Or some way to detect
that the file is incomplete (too small) on a subsequent run?
Darshit Shah
2018-10-23 07:07:08 UTC
Permalink
Post by Dave Warren
Currently when a download with timestamping enabled gets interrupted,
the timestamp of the resulting file ends up being the current time and
when wget is re-executed after connectivity is restored the local file
is then seen as newer and skipped.
robocopy handles this a little differently, by setting a date far in the
past as a way of ensuring that on a subsequent execution the transfer
can be resumed.
Is there a better way to handle this situation in wget? A way to force
an old date on the file? I'd be happy with a fixed "in the past" date,
the service supplied date minus a second, etc. Or some way to detect
that the file is incomplete (too small) on a subsequent run?
I haven't tested it but what you say indeed sounds like a valid bug.

The cleanest approach, IMO, is to use the extended file attributes in modern systems to store this time at the very beginning and look for it on continuation. Setting the time in the past doesn't work since every packet that is written will once again update the last modified time. Setting the time after each write() is not a feasible solution. What you suggest can only work when the client gets a clean exit in the face of an interruption and this isn't always the case.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Tim Rühsen
2018-10-23 07:23:03 UTC
Permalink
Post by Darshit Shah
Post by Dave Warren
Currently when a download with timestamping enabled gets interrupted,
the timestamp of the resulting file ends up being the current time and
when wget is re-executed after connectivity is restored the local file
is then seen as newer and skipped.
robocopy handles this a little differently, by setting a date far in the
past as a way of ensuring that on a subsequent execution the transfer
can be resumed.
Is there a better way to handle this situation in wget? A way to force
an old date on the file? I'd be happy with a fixed "in the past" date,
the service supplied date minus a second, etc. Or some way to detect
that the file is incomplete (too small) on a subsequent run?
I haven't tested it but what you say indeed sounds like a valid bug.
The cleanest approach, IMO, is to use the extended file attributes in modern systems to store this time at the very beginning and look for it on continuation. Setting the time in the past doesn't work since every packet that is written will once again update the last modified time. Setting the time after each write() is not a feasible solution. What you suggest can only work when the client gets a clean exit in the face of an interruption and this isn't always the case.
There is an option to skip if-modified-since. With it wget has to make
an extra HEAD request - and that not only returns a file timestamp but
also the length of the file.

@Dave Could you give us an example of a command line where the issue
occurs, resp. could you test with --no-if-modified-since ?

Regards, Tim
Tim Rühsen
2018-10-23 07:26:53 UTC
Permalink
Post by Darshit Shah
Post by Dave Warren
Currently when a download with timestamping enabled gets interrupted,
the timestamp of the resulting file ends up being the current time and
when wget is re-executed after connectivity is restored the local file
is then seen as newer and skipped.
robocopy handles this a little differently, by setting a date far in the
past as a way of ensuring that on a subsequent execution the transfer
can be resumed.
Is there a better way to handle this situation in wget? A way to force
an old date on the file? I'd be happy with a fixed "in the past" date,
the service supplied date minus a second, etc. Or some way to detect
that the file is incomplete (too small) on a subsequent run?
I haven't tested it but what you say indeed sounds like a valid bug.
The cleanest approach, IMO, is to use the extended file attributes in modern systems to store this time at the very beginning and look for it on continuation. Setting the time in the past doesn't work since every packet that is written will once again update the last modified time. Setting the time after each write() is not a feasible solution. What you suggest can only work when the client gets a clean exit in the face of an interruption and this isn't always the case.
Good idea. Though he xattr stuff is not always available... but when it
is, we can indeed take advantage of it. I'll open an issue for wget2.

Regards, Tim

Loading...