Archive for January, 2009

4chan

I have written for testing purpose a script to download images that would have been posted on the website 4chan.org. I tried to let it go for 1 day.. I’ve got 2.7GB of data for 16661 images (the number tell me something about of what i have done).

Here’s the code

#! /usr/bin/ruby
# License: GPLv3 or newer

urls=['http://img.4chan.org/b/1.html',
    " ... other 4chan boards you want ..." ]

require 'rubygems'
require 'net/http'
require 'thread'

def download(url)
    uri = URI.parse(url)
    Net::HTTP.start(uri.host) { |h|
        page = h.get(uri.path)
        data = page.body.split(/(href=|\n)/).delete_if {
                               |x| !(x =~ /\"http/)}
        data = data.map {|x| x.gsub(/^\"/,'')}
        data = data.map { |x| x.gsub(/\".*/,'')}
        data = data.delete_if {|x| !(x =~ /(png|jpg|gif)$/)}
        data.uniq.each {|x| 
            img = x.gsub(/.*\//,'')
            if !File.exist?(img) then
                puts "downloading #{img}"
                resp = h.get(URI.parse(x).path)
                open(img,"wb") { |f|
                    f.write(resp.body)
                } 
            end
        }
        h.close
    }
end

threads = []
s = Mutex.new
urls.each { |x| 
    Thread.start{
        s.synchronize{
            threads << Thread.current
        }
        download(x)}
    }
while (threads.length != urls.length) do
    Thread.pass
    sleep(5)
end

threads.each{ |x| 
    begin 
        x.join 
    rescue Exception => e 
    end }
puts "finished"
Advertisements

, , , , , ,

2 Comments

rTorrent 0.8.4 and ruby-controller 0.2

Yesterday i switched to rtorrent 0.8.4 (debian build from experimental).

Rtorrent after some little adjustment in the configuration file has started working. I just had some problems with the torrent which i hadn’t completed.

Except these two little problems it just works, along with the ruby controller too (without any modification).
In the meanwhile I have added some little fixes to the controller and a new feature. 
Now the available upload bandwidth is decreased by a fraction of the actual rtorrent download speed.
This is done to prevent even further to rtorrent to make itself slower flooding the upload bandwidth with ack packets.

, ,

Leave a comment