4chan

I have written for testing purpose a script to download images that would have been posted on the website 4chan.org. I tried to let it go for 1 day.. I’ve got 2.7GB of data for 16661 images (the number tell me something about of what i have done).

Here’s the code

#! /usr/bin/ruby
# License: GPLv3 or newer

urls=['http://img.4chan.org/b/1.html',
    " ... other 4chan boards you want ..." ]

require 'rubygems'
require 'net/http'
require 'thread'

def download(url)
    uri = URI.parse(url)
    Net::HTTP.start(uri.host) { |h|
        page = h.get(uri.path)
        data = page.body.split(/(href=|\n)/).delete_if {
                               |x| !(x =~ /\"http/)}
        data = data.map {|x| x.gsub(/^\"/,'')}
        data = data.map { |x| x.gsub(/\".*/,'')}
        data = data.delete_if {|x| !(x =~ /(png|jpg|gif)$/)}
        data.uniq.each {|x| 
            img = x.gsub(/.*\//,'')
            if !File.exist?(img) then
                puts "downloading #{img}"
                resp = h.get(URI.parse(x).path)
                open(img,"wb") { |f|
                    f.write(resp.body)
                } 
            end
        }
        h.close
    }
end

threads = []
s = Mutex.new
urls.each { |x| 
    Thread.start{
        s.synchronize{
            threads << Thread.current
        }
        download(x)}
    }
while (threads.length != urls.length) do
    Thread.pass
    sleep(5)
end

threads.each{ |x| 
    begin 
        x.join 
    rescue Exception => e 
    end }
puts "finished"
Advertisements

, , , , , ,

  1. #1 by shirosaki on November 9, 2009 - 3:40 am

    well mate, we have same taste. but u have knowledge of programming but i dont……

    same script can be created python php or another script language ?

    see ya!

  2. #2 by Dario Meloni on November 9, 2009 - 11:55 am

    it can be done in any language you want

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: