I have written for testing purpose a script to download images that would have been posted on the website 4chan.org. I tried to let it go for 1 day.. I’ve got 2.7GB of data for 16661 images (the number tell me something about of what i have done).
Here’s the code
#! /usr/bin/ruby
# License: GPLv3 or newer
urls=['http://img.4chan.org/b/1.html',
" ... other 4chan boards you want ..." ]
require 'rubygems'
require 'net/http'
require 'thread'
def download(url)
uri = URI.parse(url)
Net::HTTP.start(uri.host) { |h|
page = h.get(uri.path)
data = page.body.split(/(href=|\n)/).delete_if {
|x| !(x =~ /\"http/)}
data = data.map {|x| x.gsub(/^\"/,'')}
data = data.map { |x| x.gsub(/\".*/,'')}
data = data.delete_if {|x| !(x =~ /(png|jpg|gif)$/)}
data.uniq.each {|x|
img = x.gsub(/.*\//,'')
if !File.exist?(img) then
puts "downloading #{img}"
resp = h.get(URI.parse(x).path)
open(img,"wb") { |f|
f.write(resp.body)
}
end
}
h.close
}
end
threads = []
s = Mutex.new
urls.each { |x|
Thread.start{
s.synchronize{
threads << Thread.current
}
download(x)}
}
while (threads.length != urls.length) do
Thread.pass
sleep(5)
end
threads.each{ |x|
begin
x.join
rescue Exception => e
end }
puts "finished"
November 9, 2009 at 3:40 am |
well mate, we have same taste. but u have knowledge of programming but i dont……
same script can be created python php or another script language ?
see ya!
November 9, 2009 at 11:55 am |
it can be done in any language you want