Discussion:
[Bug-wget] Hello again
m***@cyber-dome.com
2018-10-08 17:57:11 UTC
Permalink
Hello again,

My name is Michael. I have approached you about a year ago.

I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.

This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.

I will be able to work for 3 hours every week on the project. I do need some
guidance from you.

I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?

Best regards,

Michael
Tim Rühsen
2018-10-08 19:55:16 UTC
Permalink
Post by m***@cyber-dome.com
Hello again,
My name is Michael. I have approached you about a year ago.
I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.
This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.
I will be able to work for 3 hours every week on the project. I do need some
guidance from you.
I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?
Id ? it ?

I use stock Netbeans 8.2 from https://netbeans.org/downloads/ (the All
option). But you can take the any 'version' and install the C/C++ plugin
afterwards.

These are my jdk packages installed:

default-jdk 2:1.10-68
default-jdk-headless 2:1.10-68
openjdk-10-jdk:amd64 10.0.2+13-1
openjdk-10-jdk-headless:amd64 10.0.2+13-1
openjdk-10-jre:amd64 10.0.2+13-1
openjdk-10-jre-headless:amd64 10.0.2+13-1
openjdk-7-jre-lib 7u95-2.6.4-1
openjdk-8-demo 8u181-b13-1
openjdk-8-doc 8u181-b13-1
openjdk-8-jdk:amd64 8u181-b13-1
openjdk-8-jdk-headless:amd64 8u181-b13-1
openjdk-8-jre:amd64 8u181-b13-1
openjdk-8-jre-headless:amd64 8u181-b13-1
openjdk-8-source 8u181-b13-1

What issues do you have ?

Regards, Tim
m***@cyber-dome.com
2018-10-08 20:27:51 UTC
Permalink
The issues that I have is this:

Since the source code is split in various directories (src, lib) the Netbeans lose track of source code in the lib directory.
I verified it using gdb. (You can see how dip I went).

So, can you send me your Netbeans project settings?

Thank you,

Michael

-----Original Message-----
From: Tim Rühsen <***@gmx.de>
Sent: Monday, 8 October, 2018 10:55 PM
To: ***@cyber-dome.com; bug-***@gnu.org
Subject: Re: [Bug-wget] Hello again
Post by m***@cyber-dome.com
Hello again,
My name is Michael. I have approached you about a year ago.
I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.
This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.
I will be able to work for 3 hours every week on the project. I do need some
guidance from you.
I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?
Id ? it ?

I use stock Netbeans 8.2 from https://netbeans.org/downloads/ (the All
option). But you can take the any 'version' and install the C/C++ plugin
afterwards.

These are my jdk packages installed:

default-jdk 2:1.10-68
default-jdk-headless 2:1.10-68
openjdk-10-jdk:amd64 10.0.2+13-1
openjdk-10-jdk-headless:amd64 10.0.2+13-1
openjdk-10-jre:amd64 10.0.2+13-1
openjdk-10-jre-headless:amd64 10.0.2+13-1
openjdk-7-jre-lib 7u95-2.6.4-1
openjdk-8-demo 8u181-b13-1
openjdk-8-doc 8u181-b13-1
openjdk-8-jdk:amd64 8u181-b13-1
openjdk-8-jdk-headless:amd64 8u181-b13-1
openjdk-8-jre:amd64 8u181-b13-1
openjdk-8-jre-headless:amd64 8u181-b13-1
openjdk-8-source 8u181-b13-1

What issues do you have ?

Regards, Tim
Tim Rühsen
2018-10-09 07:55:24 UTC
Permalink
Post by m***@cyber-dome.com
Since the source code is split in various directories (src, lib) the Netbeans lose track of source code in the lib directory.
I verified it using gdb. (You can see how dip I went).
lib/ is a automatically created directory (gnulib stuff, created by
'bootstrap') and normally you are not interested in it's contents.

You might have the same issue with the test directories and fuzz/. I
normally right click on the file I am interested in and enable 'Code
Assistance'.
Post by m***@cyber-dome.com
So, can you send me your Netbeans project settings?
Not the private/ stuff, but here is nbproject/configurations.xml and
nbproject/project.xml.

Regards, Tim
m***@cyber-dome.com
2018-10-09 14:39:05 UTC
Permalink
Thank you!

-----Original Message-----
From: Tim Rühsen <***@gmx.de>
Sent: Tuesday, 9 October, 2018 10:55 AM
To: ***@cyber-dome.com; bug-***@gnu.org
Subject: Re: [Bug-wget] Hello again
Post by m***@cyber-dome.com
Since the source code is split in various directories (src, lib) the Netbeans lose track of source code in the lib directory.
I verified it using gdb. (You can see how dip I went).
lib/ is a automatically created directory (gnulib stuff, created by
'bootstrap') and normally you are not interested in it's contents.

You might have the same issue with the test directories and fuzz/. I
normally right click on the file I am interested in and enable 'Code
Assistance'.
Post by m***@cyber-dome.com
So, can you send me your Netbeans project settings?
Not the private/ stuff, but here is nbproject/configurations.xml and
nbproject/project.xml.

Regards, Tim
Darshit Shah
2018-10-09 11:51:53 UTC
Permalink
Hi Michael,

Nice to hear from you again. I vaguely remember a mention of someone who wanted
to work on this feature. When deciding to make this work, please remember that
any of this can only work if the site does not rely on Javascript; which given
Wordpress is a difficult thing. The reason for this is that we do _not_ intend
to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
much of a maintenance nightmare. However, if the site can work without
Javascript, then I would assume that Wget2 can already handle making a static
copy. If it can't handle something, please let us know / file a bug report
about it.

Of course, I welcome you to work on Wget2 as you see fit. And we would love to
look at any contributions you can make. We will also try and help you out as
much as possible when dealing with the codebase.

About the dev setup, I only use vim and gdb to work with Wget. As Tim has
already mentioned, he uses Netbeans and might be able to help you out.

You also mentioned something about the lib/ directory. That is an
auto-generated dir with compatibility libs that you don't need to care about.
All the code for Wget2 is in src/ and the code for the library is in libwget/.
Those are the two main directories you need to care about. And of course tests/
for the tests.
Post by m***@cyber-dome.com
Hello again,
My name is Michael. I have approached you about a year ago.
I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.
This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.
I will be able to work for 3 hours every week on the project. I do need some
guidance from you.
I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?
Best regards,
Michael
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
m***@cyber-dome.com
2018-10-09 14:54:02 UTC
Permalink
Hello Darshit Shah,

Thank you for your welcome message. I am glad to be part of your project!

I don't understand the term "javascript engine". AFAK javascript is code that run on the browser side, and we have no problem fetching it.

There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by programmers and they will have to take some action on their site to incorporate the engine.

POST requests to comments and mail will need to taken care of so they will work on static site. One solution is to do hosted supplier that will carry the task and deliver spam removal as well.
I think I will be able to a howto document on that.

Michael

-----Original Message-----
From: Darshit Shah <***@gnu.org>
Sent: Tuesday, 9 October, 2018 2:52 PM
To: ***@cyber-dome.com
Cc: bug-***@gnu.org
Subject: Re: [Bug-wget] Hello again

Hi Michael,

Nice to hear from you again. I vaguely remember a mention of someone who wanted
to work on this feature. When deciding to make this work, please remember that
any of this can only work if the site does not rely on Javascript; which given
Wordpress is a difficult thing. The reason for this is that we do _not_ intend
to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
much of a maintenance nightmare. However, if the site can work without
Javascript, then I would assume that Wget2 can already handle making a static
copy. If it can't handle something, please let us know / file a bug report
about it.

Of course, I welcome you to work on Wget2 as you see fit. And we would love to
look at any contributions you can make. We will also try and help you out as
much as possible when dealing with the codebase.

About the dev setup, I only use vim and gdb to work with Wget. As Tim has
already mentioned, he uses Netbeans and might be able to help you out.

You also mentioned something about the lib/ directory. That is an
auto-generated dir with compatibility libs that you don't need to care about.
All the code for Wget2 is in src/ and the code for the library is in libwget/.
Those are the two main directories you need to care about. And of course tests/
for the tests.
Post by m***@cyber-dome.com
Hello again,
My name is Michael. I have approached you about a year ago.
I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.
This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.
I will be able to work for 3 hours every week on the project. I do need some
guidance from you.
I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?
Best regards,
Michael
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
'Darshit Shah'
2018-10-11 09:34:52 UTC
Permalink
Post by m***@cyber-dome.com
Hello Darshit Shah,
Thank you for your welcome message. I am glad to be part of your project!
I don't understand the term "javascript engine". AFAK javascript is code that run on the browser side, and we have no problem fetching it.
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
Post by m***@cyber-dome.com
There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by programmers and they will have to take some action on their site to incorporate the engine.
Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
Post by m***@cyber-dome.com
POST requests to comments and mail will need to taken care of so they will work on static site. One solution is to do hosted supplier that will carry the task and deliver spam removal as well.
I think I will be able to a howto document on that.
Michael
-----Original Message-----
Sent: Tuesday, 9 October, 2018 2:52 PM
Subject: Re: [Bug-wget] Hello again
Hi Michael,
Nice to hear from you again. I vaguely remember a mention of someone who wanted
to work on this feature. When deciding to make this work, please remember that
any of this can only work if the site does not rely on Javascript; which given
Wordpress is a difficult thing. The reason for this is that we do _not_ intend
to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
much of a maintenance nightmare. However, if the site can work without
Javascript, then I would assume that Wget2 can already handle making a static
copy. If it can't handle something, please let us know / file a bug report
about it.
Of course, I welcome you to work on Wget2 as you see fit. And we would love to
look at any contributions you can make. We will also try and help you out as
much as possible when dealing with the codebase.
About the dev setup, I only use vim and gdb to work with Wget. As Tim has
already mentioned, he uses Netbeans and might be able to help you out.
You also mentioned something about the lib/ directory. That is an
auto-generated dir with compatibility libs that you don't need to care about.
All the code for Wget2 is in src/ and the code for the library is in libwget/.
Those are the two main directories you need to care about. And of course tests/
for the tests.
Post by m***@cyber-dome.com
Hello again,
My name is Michael. I have approached you about a year ago.
I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.
This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.
I will be able to work for 3 hours every week on the project. I do need some
guidance from you.
I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?
Best regards,
Michael
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
m***@cyber-dome.com
2018-10-12 10:58:11 UTC
Permalink
Hello Darshit Shah,

Converting a CMS system to static HTML pages is not a solution that suite all. Some sites which want to be 'dynamic' and retain "backward flik-flak" abilities might not use wget2 and retain their CMS or software behavior.

Many people creating a website use CMS to generate the site because of its abilities to retain uniform website and make every change in GUI site-wide. Those people might want to have the static website as it is faster to download (Google SEO factor) and much more secure - hiding the CMS location and preventing login attempts.

If those people would want to retain features as RSS feeds, we might be able to tell them how they can have it.

If a website contains some hidden pages that are connected by JavaScript code, the programmer might create a shell script calling wget2 specifying each hidden page location.

Have a good weekend!

Michael



-----Original Message-----
From: 'Darshit Shah' <***@gnu.org>
Sent: Thursday, 11 October, 2018 12:35 PM
To: ***@cyber-dome.com
Cc: bug-***@gnu.org
Subject: Re: [Bug-wget] Hello again
Post by m***@cyber-dome.com
Hello Darshit Shah,
Thank you for your welcome message. I am glad to be part of your project!
I don't understand the term "javascript engine". AFAK javascript is code that run on the browser side, and we have no problem fetching it.
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
Post by m***@cyber-dome.com
There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by programmers and they will have to take some action on their site to incorporate the engine.
Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
Post by m***@cyber-dome.com
POST requests to comments and mail will need to taken care of so they will work on static site. One solution is to do hosted supplier that will carry the task and deliver spam removal as well.
I think I will be able to a howto document on that.
Michael
-----Original Message-----
Sent: Tuesday, 9 October, 2018 2:52 PM
Subject: Re: [Bug-wget] Hello again
Hi Michael,
Nice to hear from you again. I vaguely remember a mention of someone who wanted
to work on this feature. When deciding to make this work, please remember that
any of this can only work if the site does not rely on Javascript; which given
Wordpress is a difficult thing. The reason for this is that we do _not_ intend
to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
much of a maintenance nightmare. However, if the site can work without
Javascript, then I would assume that Wget2 can already handle making a static
copy. If it can't handle something, please let us know / file a bug report
about it.
Of course, I welcome you to work on Wget2 as you see fit. And we would love to
look at any contributions you can make. We will also try and help you out as
much as possible when dealing with the codebase.
About the dev setup, I only use vim and gdb to work with Wget. As Tim has
already mentioned, he uses Netbeans and might be able to help you out.
You also mentioned something about the lib/ directory. That is an
auto-generated dir with compatibility libs that you don't need to care about.
All the code for Wget2 is in src/ and the code for the library is in libwget/.
Those are the two main directories you need to care about. And of course tests/
for the tests.
Post by m***@cyber-dome.com
Hello again,
My name is Michael. I have approached you about a year ago.
I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.
This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.
I will be able to work for 3 hours every week on the project. I do need some
guidance from you.
I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?
Best regards,
Michael
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
--
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
Loading...