4chan

I have written for testing purpose a script to download images that would have been posted on the website 4chan.org. I tried to let it go for 1 day.. I’ve got 2.7GB of data for 16661 images (the number tell me something about of what i have done).

Here’s the code

#! /usr/bin/ruby
# License: GPLv3 or newer

urls=['http://img.4chan.org/b/1.html',
    " ... other 4chan boards you want ..." ]

require 'rubygems'
require 'net/http'
require 'thread'

def download(url)
    uri = URI.parse(url)
    Net::HTTP.start(uri.host) { |h|
        page = h.get(uri.path)
        data = page.body.split(/(href=|\n)/).delete_if {
                               |x| !(x =~ /\"http/)}
        data = data.map {|x| x.gsub(/^\"/,'')}
        data = data.map { |x| x.gsub(/\".*/,'')}
        data = data.delete_if {|x| !(x =~ /(png|jpg|gif)$/)}
        data.uniq.each {|x|
            img = x.gsub(/.*\//,'')
            if !File.exist?(img) then
                puts "downloading #{img}"
                resp = h.get(URI.parse(x).path)
                open(img,"wb") { |f|
                    f.write(resp.body)
                }
            end
        }
        h.close
    }
end

threads = []
s = Mutex.new
urls.each { |x|
    Thread.start{
        s.synchronize{
            threads << Thread.current
        }
        download(x)}
    }
while (threads.length != urls.length) do
    Thread.pass
    sleep(5)
end

threads.each{ |x|
    begin
        x.join
    rescue Exception => e
    end }
puts "finished"

2 Responses to “4chan”

  1. shirosaki Says:

    well mate, we have same taste. but u have knowledge of programming but i dont……

    same script can be created python php or another script language ?

    see ya!

  2. Dario Meloni Says:

    it can be done in any language you want

Leave a Reply