May 17, 2009
This little ruby script can download a whole anime serie from lolipower.org, when the anime page respects the “standards”. It verifies the MD5 checksum of downloads and it avoids to download already downloaded (checksum checked) or duplicated episodes.
If there are duplicate episodes the mkv format one will be preferred. You can also choose ,with command line arguments, the first and last episode to download.
You can find the link to the project in the code page or just here
git://github.com/alka/down-lolipower.git
2 Comments |
Ruby, Script | Tagged: download, gem, hash, http, md5, Ruby |
Permalink
Posted by alkawiz
January 19, 2009
I have written for testing purpose a script to download images that would have been posted on the website 4chan.org. I tried to let it go for 1 day.. I’ve got 2.7GB of data for 16661 images (the number tell me something about of what i have done).
Here’s the code
#! /usr/bin/ruby
# License: GPLv3 or newer
urls=['http://img.4chan.org/b/1.html',
" ... other 4chan boards you want ..." ]
require 'rubygems'
require 'net/http'
require 'thread'
def download(url)
uri = URI.parse(url)
Net::HTTP.start(uri.host) { |h|
page = h.get(uri.path)
data = page.body.split(/(href=|\n)/).delete_if {
|x| !(x =~ /\"http/)}
data = data.map {|x| x.gsub(/^\"/,'')}
data = data.map { |x| x.gsub(/\".*/,'')}
data = data.delete_if {|x| !(x =~ /(png|jpg|gif)$/)}
data.uniq.each {|x|
img = x.gsub(/.*\//,'')
if !File.exist?(img) then
puts "downloading #{img}"
resp = h.get(URI.parse(x).path)
open(img,"wb") { |f|
f.write(resp.body)
}
end
}
h.close
}
end
threads = []
s = Mutex.new
urls.each { |x|
Thread.start{
s.synchronize{
threads << Thread.current
}
download(x)}
}
while (threads.length != urls.length) do
Thread.pass
sleep(5)
end
threads.each{ |x|
begin
x.join
rescue Exception => e
end }
puts "finished"
2 Comments |
Ruby | Tagged: 4chan, download, http, image, net, page, Ruby |
Permalink
Posted by Dario Meloni