Page 1 of 1 [ 14 posts ] 

JPmoney
Tufted Titmouse
Tufted Titmouse

User avatar

Joined: 1 Dec 2007
Gender: Male
Posts: 49
Location: Minnesota

02 Jan 2008, 12:51 am

If I use the "Run..." command and type in

Code:
"C:\Program Files\Internet Explorer\iexplore.exe" google.com autism

I can have Google bring up search results on autism. Now here's the question: What's the code for saving pages? I'd like to know because I want Task Scheduler to save webpages for me at night.



gbollard
Veteran
Veteran

User avatar

Joined: 5 Oct 2007
Age: 57
Gender: Male
Posts: 4,009
Location: Sydney, Australia

02 Jan 2008, 2:20 am

Two things that you might be interested in...

1. You should be able to run a command line to web page directly.

ie: Start, Run,
http://www.google.com/search?source=ig& ... =&q=autism

2. More importantly, you can get google to do your searches for you and email you a daily list of topics using Google Alert. This seems like what you want to do.

Go to ...

http://www.google.com/alerts?hl=en

and set it up there.



JPmoney
Tufted Titmouse
Tufted Titmouse

User avatar

Joined: 1 Dec 2007
Gender: Male
Posts: 49
Location: Minnesota

02 Jan 2008, 9:35 am

That's a good idea, but what do I do about the other websites I want to save?



gbollard
Veteran
Veteran

User avatar

Joined: 5 Oct 2007
Age: 57
Gender: Male
Posts: 4,009
Location: Sydney, Australia

02 Jan 2008, 3:16 pm

Do you mean other searches you want to do or do you mean that you want a method of copying websites offline?

There's some tools that do this some free and others not.
This one isn't free but seems good.
http://www.internet-soft.com/extractor.htm

Personally, I use Adobe Acrobat because it copies the whole site down into a PDF. Of course it doesn't preserve everything perfectly



wolphin
Velociraptor
Velociraptor

User avatar

Joined: 15 Aug 2007
Age: 36
Gender: Male
Posts: 465

03 Jan 2008, 6:03 am

there is a linux/unix/mac program called wget that does web downloading and supports "mirroring" a whole website. (and probably has been ported to windows)

See http://www.jim.roberts.net/articles/wget.html



JPmoney
Tufted Titmouse
Tufted Titmouse

User avatar

Joined: 1 Dec 2007
Gender: Male
Posts: 49
Location: Minnesota

03 Jan 2008, 1:19 pm

wolphin wrote:
there is a linux/unix/mac program called wget that does web downloading and supports "mirroring" a whole website. (and probably has been ported to windows)

See http://www.jim.roberts.net/articles/wget.html

Thank you so much. I've downloaded the newest Windows version from http://xoomer.alice.it/hherold/ and I've put the files in C:\Program Files\wget. In the Run command I type in
Code:
"C:\Program Files\wget\wget.exe" website

and it works really well. However, there's one webpage, http://americasmusiccharts.com/index.cgi?fmt=A2, that I'd like to save. The problem is, it doesn't recognize the format, so it always uses the text "cgi@fmt=A2" (the last piece of the URL) as the extension. After it's done saving, I am able to change the extension to htm or html to solve the problem, but I don't know whether it's recommended to do it that way. Is there a command line option to specify the file type as it's saving? Also, is there a command line option to have it keep the images with the page?



computerlove
Veteran
Veteran

User avatar

Joined: 10 Jul 2006
Age: 123
Gender: Male
Posts: 5,791

04 Jan 2008, 6:40 pm

DownThemAll has some nice features :)


_________________
One of God's own prototypes. Some kind of high powered mutant never even considered for mass production. Too weird to live, and too rare to die.


wolphin
Velociraptor
Velociraptor

User avatar

Joined: 15 Aug 2007
Age: 36
Gender: Male
Posts: 465

05 Jan 2008, 7:52 am

Changing the file ending to .htm or .html will "work" in some sense, but isn't really the best way to do it.

Try adding "-E" to the command line (without quotes), here's the wget manual describing the options:

http://www.delorie.com/gnu/docs/wget/wget_9.html



JPmoney
Tufted Titmouse
Tufted Titmouse

User avatar

Joined: 1 Dec 2007
Gender: Male
Posts: 49
Location: Minnesota

06 Jan 2008, 5:53 pm

Thank you. I have just one final question: If you're using dial-up and you're not connected to the Internet, is there a way you can command Wget (or any other program, like Internet Explorer) to log on and dial for you?



wolphin
Velociraptor
Velociraptor

User avatar

Joined: 15 Aug 2007
Age: 36
Gender: Male
Posts: 465

07 Jan 2008, 4:11 am

I know there's the "rasdial" command that's part of windows, but really, I don't know anything about how to use it.

http://www.microsoft.com/resources/docu ... x?mfr=true



codarac
Veteran
Veteran

User avatar

Joined: 28 Oct 2006
Age: 47
Gender: Male
Posts: 780
Location: UK

10 Jan 2008, 6:54 pm

wolphin wrote:
there is a linux/unix/mac program called wget that does web downloading and supports "mirroring" a whole website. (and probably has been ported to windows)

See http://www.jim.roberts.net/articles/wget.html


Wow, that looks excellent!
Thanks for posting that.



codarac
Veteran
Veteran

User avatar

Joined: 28 Oct 2006
Age: 47
Gender: Male
Posts: 780
Location: UK

10 Feb 2008, 9:07 am

codarac wrote:
wolphin wrote:
there is a linux/unix/mac program called wget that does web downloading and supports "mirroring" a whole website. (and probably has been ported to windows)

See http://www.jim.roberts.net/articles/wget.html


Wow, that looks excellent!
Thanks for posting that.


Ok, I've only just tried it out. It's pretty good.

I didn't expect to have to save each page of a website one by one though.
Is there a way round this - some other command?

Or simply a quicker way of doing it ...
Like, is there a way you can list all the sub-addresses of a particular website, stick the list in a text file, and then type one command that reads the list in the text file, mirroring each page one by one?



wolphin
Velociraptor
Velociraptor

User avatar

Joined: 15 Aug 2007
Age: 36
Gender: Male
Posts: 465

10 Feb 2008, 7:22 pm

I don't know a way to do it based on a list of webpages to download (except to write a script to reinvoke wget over and over again) but wget has a lot of built in options to automatically walk through a site and download all the pages at once. See:

http://www.gnu.org/software/wget/manual/wget.html#Recursive-Download



LostInEmulation
Veteran
Veteran

User avatar

Joined: 10 Feb 2008
Age: 41
Gender: Female
Posts: 2,047
Location: Ireland, dreaming of Germany

12 Feb 2008, 4:21 am

For that, use:

--input-file=filename