chebe | Backing-up my LJ ScrapBook pics

Edit: check the comments for more recent methods some other helpful people have found.

NOTE: these are instructions for LJs old ScrapBook service. If you've been moved to the new one the process is mostly the same, but some changes will be needed. See link in this comment.

Step 1: cookies.txt

My browser of choice is Firefox, which since v3 has used sqlite databases to store cookies, instead of the older method of dumping them into a .txt file. The auto download tool I'm going to use requires the older style cookies file. So I went and found a plugin (Firefox only). You have to restart after installation.

Now, go to LJ and log in. Then Tools > Export Cookies... and chose a save file location.

Note: the next step can be done without the cookies file, but only your publicly visible gallery>files will be saved. The cookies.txt file is a snapshot. You will have to regenerate it next time you want to back up your Scrapbook if you've logged-out/logged-in in the meantime.

Step 2: go get those files
I'm using wget. It exists on many platforms but I'm running it, and the other programs used later on, under linux.

(Do you have any Unsorted files? Please see edit note at bottom of post.)

Option A: greedily grab it all

The short commands are;

-nc; no clobber, meaning don't download additional copies of existing files

-np; no parent, this is very IMPORTANT, it means only look in subdirs, don't ascend up into the depths of livejournal.com saving everything along the way

-r; recursive

-o output.file; redirect output to specified log file

Put it all together and you have;
wget --load-cookies cookies.txt -nc -np -r -o log.txt http://pics.livejournal.com/your_user_name

This will run for quite some time, the more pics you have, the more time it will take.

But wget downloads everything, not just pictures, and in ScrapBooks hierarchy (which is not the most useful for humans). So it's going to need sorting, but I'll leave that up to you.

Option B: pick and choose

Part 1; spider

--spider; follow the links as usual, but don't download any files

-nd; no directories, don't create any

wget --load-cookies cookies.txt --spider -nd -np -r -o spider.txt http://pics.livejournal.com/your_user_name

I left this running for a few hours. LJ says I've over 400 ScrapBook files (on my Profile page, but these are just public images), using <300MB of storage. And I have a very unreliable network. In the end spider.txt reached 1.6MB and finished with this;
Downloaded: 817 files, 5.1M in 3m 33s (24.3 KB/s)

Part 2; filter and grab

cat spider.txt | grep pics.livejournal.com/your_user_name/pic | sed -r 's/^.*(pics\.livejournal\.com\/your_user_name\/pic\/[0-9a-z]{8}).*$/http:\/\/\1/' | sort -n | uniq > links.txt

Resulting links.txt file was 635 lines/links long.

-i input.file; input file of links to visit

wget --load-cookies cookies.txt -i links.txt -np -o dl.txt

dl.txt ending with;
Downloaded: 629 files, 262M in 56m 56s (122 KB/s)
And after everything, 629 images end up on my local hard-drive, weighing in at just 264MB.

If you are getting 'ERROR 403: Forbidden' messages you probably aren't signed in while trying to access protected images. Make sure you are loading your cookies!

When all is said and done you'll have your images, with their LJ names (helpful if you've linked them in journal posts but want to move image hosting), but you'll lose any gallery/directory structure you may have hoped for.

And please don't abuse this tool, annoying LJ won't help anyone get their images.

*edit* Are you missing the images from the Unsorted gallery? Yeah, me too. Seems that because they aren't linked like the others they get missed.

> Prevention: before you do anything with wget create a gallery and add all the images in Unsorted to it.
> Patch: hopefully you don't have too many Unsorted. Go through them one-by-one copying link location (leave off the trailing /), put them in a new .txt file, and run the last command with the new input file. (Or just save them manually while you're there anyway getting the links.)
If you do have lots, you could try spidering from http://pics.livejournal.com/manage/pics?gal=1, but I haven't done it.

Flat | Top-Level Comments Only

From:

foxfirefey

Oooh, could I post a link about this to

lj_refugees? It's a great resource.

chebe

Please do! As long as too many people don't do it at once it should be fine :)

pfctdayelise

this looks very useful, thanks! I'm going to bookmark it for one day when data integrity makes it to the top of my priorities (it happens every year or so :)).

Let me know how well it works when you get around to it, and what needs to be fixed up :)

hel

Thank you Thank you Thank you! I was searching and searching for some tool to do this with, and constantly going through my head was "surely wget could do this, if I just knew what to tell it!" Now I have rescued all my pictures, stuff I had lost all other copies of!

No problem at all! Can I just ask, if you used cookies what browser/plugin did you use, and how did you hear of my little post? *scratches head* Glad to have been of some help :)

I did use cookies, with the plugin you linked in firefox... 7? 8? Dunno, not at that computer now.
And I'm not sure, but I believe I found your post via Google.

Ah, cool, I wasn't sure they updated the plugin, good news. I tend to forget that google is watching *hides* Thanks!

ladykalessia

Before I go deep-diving into the manpages, has anyone ever done this using curl? (I'm not seeing all equivalent args in the -help.) It looks like a default install of OS X doesn't grok wget.

I haven't, but I had a quick look around and found;
"curl is basically made to do single-shot transfers of data. It transfers just the URLs that the user specifies, and does not contain any recursive downloading logic nor any sort of HTML parser." (from here)

Which could be the main problem. What I've done above relies on the recursive behaviour of wget. Not that you can't, but it might take more scripting on your side to get similar behaviour.

If you still want to try wget without the VirtualBox/Linux vm setup, maybe see if this binary is useful/still around?
http://www.makeuseof.com/tag/wget-mac/

Or try building from source (not as scary as it sounds): http://www.hacksparrow.com/how-to-install-wget-on-your-mac.html

Hope something works!

Ah, that's right. I've used it before for a range of values like http://www.myawesomesite.com/images/Picture[1-99].jpg, which is why I kept thinking it could do recursion. I wonder though if it would parse a wildcard as those weirdly hashed image names and grab all of the original source images. This is of course provided you could point it at the correct depth in the Scrapbook system and also get it to parse the cookies file correctly.

All of this is speculation on my part because I was able to run wget from a Linux box on our network and get a 300MB dump of my files. Good instructions, thanks!

Ah, cool! Glad it worked :) If you do get around to trying it with curl let me know how it goes? I'm sure there are plenty of Mac users who would thank you!

joecarnahan.livejournal.com

Just wanted to say thanks - These were great instructions and very helpful.

LiveJournal is apparently in the process of migrating to a different implementation of Scrapbook. This was good, as it meant that they sent me an email to remind me that I even had any Scrapbook pictures that I might want to save. Unfortunately, it also meant that galleries now live under http://USERNAME.livejournal.com/pics/gallery instead of http://pics.livejournal.com/USERNAME. Still, it was relatively easy for me to adapt your instructions to the new scheme.

I documented the steps I followed here (http://joecarnahan.livejournal.com/396071.html). I suspect that if I spent a few minutes thinking about it, I could figure out how to preserve the original album structure and maybe even the album names. Still, I figured I should go ahead, post what I had, and thank you now for your instructions. :-)

Nice! Thanks for the update. They went and moved my photos too, so I'll need these in a little bit :)

From: (Anonymous)

Hi,
after your and joecarnahan's methods didn't work anymore, I have modified the procedure so that it can be used again to download Scrapbook photos.
My explanations can be found here. (http://andlauer.net/biologie/apps/livejournal.html)

Cheers,
Till

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Busy hands | Disquiet mind

Craft and Tech Notebook

Backing-up my LJ ScrapBook pics

Backing-up my LJ ScrapBook pics

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Thanks!

Re: Thanks!

Update

Profile

July 2025

Expand Cut Tags

Index

DW links

Style Credit