chebe: (OlympusCamera)
[personal profile] chebe
Edit: check the comments for more recent methods some other helpful people have found.

NOTE: these are instructions for LJs old ScrapBook service. If you've been moved to the new one the process is mostly the same, but some changes will be needed. See link in this comment.

Step 1: cookies.txt

My browser of choice is Firefox, which since v3 has used sqlite databases to store cookies, instead of the older method of dumping them into a .txt file. The auto download tool I'm going to use requires the older style cookies file. So I went and found a plugin (Firefox only). You have to restart after installation.

Now, go to LJ and log in. Then Tools > Export Cookies... and chose a save file location.

Note: the next step can be done without the cookies file, but only your publicly visible gallery>files will be saved. The cookies.txt file is a snapshot. You will have to regenerate it next time you want to back up your Scrapbook if you've logged-out/logged-in in the meantime.

Step 2: go get those files
I'm using wget. It exists on many platforms but I'm running it, and the other programs used later on, under linux.

(Do you have any Unsorted files? Please see edit note at bottom of post.)

Option A: greedily grab it all

The short commands are;
  • -nc; no clobber, meaning don't download additional copies of existing files
  • -np; no parent, this is very IMPORTANT, it means only look in subdirs, don't ascend up into the depths of saving everything along the way
  • -r; recursive
  • -o output.file; redirect output to specified log file

  • Put it all together and you have;
    wget --load-cookies cookies.txt -nc -np -r -o log.txt

    This will run for quite some time, the more pics you have, the more time it will take.

    But wget downloads everything, not just pictures, and in ScrapBooks hierarchy (which is not the most useful for humans). So it's going to need sorting, but I'll leave that up to you.

    Option B: pick and choose

    Part 1; spider
  • --spider; follow the links as usual, but don't download any files
  • -nd; no directories, don't create any

  • wget --load-cookies cookies.txt --spider -nd -np -r -o spider.txt

    I left this running for a few hours. LJ says I've over 400 ScrapBook files (on my Profile page, but these are just public images), using <300MB of storage. And I have a very unreliable network. In the end spider.txt reached 1.6MB and finished with this;
    Downloaded: 817 files, 5.1M in 3m 33s (24.3 KB/s)

    Part 2; filter and grab
    cat spider.txt | grep | sed -r 's/^.*(pics\.livejournal\.com\/your_user_name\/pic\/[0-9a-z]{8}).*$/http:\/\/\1/' | sort -n | uniq > links.txt

    Resulting links.txt file was 635 lines/links long.

  • -i input.file; input file of links to visit

  • wget --load-cookies cookies.txt -i links.txt -np -o dl.txt

    dl.txt ending with;
    Downloaded: 629 files, 262M in 56m 56s (122 KB/s)
    And after everything, 629 images end up on my local hard-drive, weighing in at just 264MB.

    If you are getting 'ERROR 403: Forbidden' messages you probably aren't signed in while trying to access protected images. Make sure you are loading your cookies!

    When all is said and done you'll have your images, with their LJ names (helpful if you've linked them in journal posts but want to move image hosting), but you'll lose any gallery/directory structure you may have hoped for.

    And please don't abuse this tool, annoying LJ won't help anyone get their images.

    *edit* Are you missing the images from the Unsorted gallery? Yeah, me too. Seems that because they aren't linked like the others they get missed.

    > Prevention: before you do anything with wget create a gallery and add all the images in Unsorted to it.
    > Patch: hopefully you don't have too many Unsorted. Go through them one-by-one copying link location (leave off the trailing /), put them in a new .txt file, and run the last command with the new input file. (Or just save them manually while you're there anyway getting the links.)
    If you do have lots, you could try spidering from, but I haven't done it.
    Anonymous (will be screened)
    OpenID (will be screened if not validated)
    Identity URL: 
    Account name:
    If you don't have an account you can create one now.
    HTML doesn't work in the subject.


    If you are unable to use this captcha for any reason, please contact us by email at

    Notice: This account is set to log the IP addresses of everyone who comments.
    Links will be displayed as unclickable URLs to help prevent spam.
    Page generated 2017-Oct-21, Saturday 01:12 am
    Powered by Dreamwidth Studios