What do you use to capture/save web pages for archival purposes?

Questions on how we spend our money and our time - consumer goods and services, home and vehicle, leisure and recreational activities
Post Reply
Tamales
Posts: 1296
Joined: Sat Jul 05, 2014 10:47 am

What do you use to capture/save web pages for archival purposes?

Post by Tamales » Fri Jul 07, 2017 2:29 pm

It used to be that CTRL+P within the Chrome browser would capture most web page content as a multi-page pdf file with no problem.

But it has progressively deteriorated over the past couple years to the point that it rarely works anymore, and you often get a jumble of misaligned text, overlapping text, and overlaid banners that make the result essentially unreadable, or key parts end up masked by blocks that were not there in the original.

Does anyone have a reliable way to capture web pages as pdf files where the original content and alignment is preserved, and the text can be searched and selected/copied (i.e. I don't want an image file)?

Using an old acronym, I'd like a WYSIWYG pdf capture of web pages.

User avatar
JamesSFO
Posts: 3111
Joined: Thu Apr 26, 2012 10:16 pm

Re: What do you use to capture/save web pages for archival purposes?

Post by JamesSFO » Fri Jul 07, 2017 2:45 pm

Personally I use Evernote and their web clipper extension, it works surprisingly well and can often create a "reader" like view for commonly used sites such as NY Times and what not so the annoying headers and ads are omitted if you prefer.

Tamales
Posts: 1296
Joined: Sat Jul 05, 2014 10:47 am

Re: What do you use to capture/save web pages for archival purposes?

Post by Tamales » Fri Jul 07, 2017 3:19 pm

Thanks James. I've heard of Evernote but never looked into it. They have a pretty sparse home page, but this looks like a cloud-based service, is that correct?

I'd rather stick with an application or browser add-on resident on my hard drive and content stored on my hard drive.

sksbog
Posts: 219
Joined: Wed Jun 20, 2012 9:14 pm

Re: What do you use to capture/save web pages for archival purposes?

Post by sksbog » Fri Jul 07, 2017 3:24 pm

MS OneNote comes default with most newer PCs.
You could send the prinout to OneNote and there it has features to organize the articles you capture.


renue74
Posts: 1185
Joined: Tue Apr 07, 2015 7:24 pm

Re: What do you use to capture/save web pages for archival purposes?

Post by renue74 » Fri Jul 07, 2017 3:43 pm

I'm on a Mac. I use "shift-command-4" and take a screenshot. All my screenshots are saved to my dropbox folder called "Screenshots" automatically.

I can then view screenshots across all my devices, laptop, phone, ipad.

Fclevz
Posts: 380
Joined: Fri Mar 30, 2007 11:28 am

Re: What do you use to capture/save web pages for archival purposes?

Post by Fclevz » Fri Jul 07, 2017 3:49 pm

renue74 wrote:I'm on a Mac. I use "shift-command-4" and take a screenshot. All my screenshots are saved to my dropbox folder called "Screenshots" automatically.

I can then view screenshots across all my devices, laptop, phone, ipad.
That simply saves the screenshot as a graphic, which, unfortunately, isn't searchable like a PDF.
A better way in Safari might be to simply choose File --> Export as PDF

jebmke
Posts: 8424
Joined: Thu Apr 05, 2007 2:44 pm

Re: What do you use to capture/save web pages for archival purposes?

Post by jebmke » Fri Jul 07, 2017 4:00 pm

I use Evernote with web clipper. The tags make it easy to cross-reference different topics and combine with other sources of information.
When you discover that you are riding a dead horse, the best strategy is to dismount.

majiaknight
Posts: 55
Joined: Tue Jan 26, 2016 2:55 pm

Re: What do you use to capture/save web pages for archival purposes?

Post by majiaknight » Fri Jul 07, 2017 7:06 pm

+1 Evernote and their web clipper extension

Every webpage you clipped could be easily reviewed and managed (via tagging) on your desktop/mobile Evernote app even offline.

User avatar
nisiprius
Advisory Board
Posts: 36881
Joined: Thu Jul 26, 2007 9:33 am
Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry

Re: What do you use to capture/save web pages for archival purposes?

Post by nisiprius » Fri Jul 07, 2017 7:27 pm

I use various techniques. None of them are very good. One problem, of course, is that the Web is hypertext and PDFs etc. are pretty much linear text. Another is that it seems as if many websites really don't want you to save them.

On the Mac, Safari has a "Save as... web archive" feature. This actually works beautifully. It captures the exact appearance of the web page, images and all. I don't know exactly what it's doing, but it's obviously preserving the CSS. When you open the page, not only do you get the exact page appearance but all of the links in it are live. The biggest problem with this feature is that it seems to be a Safari-specific file format, which probably will serve me well as long as I keep using a Mac and Apple doesn't change Safari too much, but... you know... it's like old Multiplan files. They probably have a much shorter useful "archival life" than .pdf or .jpg. I'm pretty sure some of the Linux-world web browsers have a similar feature.

Sometimes I get good results with "printing" the file and selecting "save as PDF" in the print dialog. The problem with this is that many web page print very poorly and relatively few of them have a "printer friendly" option any more.

It's quite unpredictable whether or not any given web page will offer a "reading view" in Safari. If it does, then maybe 3/4 of the time the "reading view" will be a lovely, clean, crap-free, sensibly laid out page that can then be printed to PDF, saved as a web archive, or whatever. About 1/4 of the time the "reading view" has serious problems, the most common one being containing only a small fraction of the page. "Reading view" suffers from the problem that Safari is trying to do something that many websites don't want done.

Many articles are designed to be spread out across six or ten web pages to force you to view as many ads as possible. Nothing seems to do a very good job of following the links for you.

Ah, for several years I used an extension called "Readability" that did a good job on maybe half of all websites of providing a clean reading view, and, this was slick: giving a single-click option of sending that clean, readable view to my Kindle as a Kindle document. However, it's no longer available.

Another problem is that in some cases you really want to follow links and save the whole tree structure of a website. There are various tools available for this on various platforms. I use SiteSucker for the Mac to capture my own websites, which works pretty well and has no issues because my own websites are not designed to try to stop people from capturing them. Not that anybody would want to. (I'm talking about personal junk, intended for, say, fifty classmates going to a high school reunion, stuff like that...)
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.

Post Reply