Lua-based CURL web parser + multithreading. File & UTF-8 API included.
Find a file
2021-02-27 19:04:20 +02:00
_UpgradeReport_Files Initial commit 2018-07-10 13:54:56 +03:00
htmlcxx Initial commit 2018-07-10 13:54:56 +03:00
ipch/reimu-d0bd5608 Initial commit 2018-07-10 13:54:56 +03:00
lcore Initial commit 2018-07-10 13:54:56 +03:00
liblua Initial commit 2018-07-10 13:54:56 +03:00
ltest Initial commit 2018-07-10 13:54:56 +03:00
lua534 Initial commit 2018-07-10 13:54:56 +03:00
reimu Initial commit 2018-07-10 13:54:56 +03:00
Release Initial commit 2018-07-10 13:59:02 +03:00
.gitignore 27.02.2021: Fix .gitignore. 2021-02-27 19:04:20 +02:00
lua534.sdf Initial commit 2018-07-10 13:54:56 +03:00
lua534.sln Initial commit 2018-07-10 13:54:56 +03:00
lua534.suo Initial commit 2018-07-10 13:54:56 +03:00
README.md 27.02.2021: Update README to new repository name. 2021-02-27 18:59:06 +02:00
UpgradeLog.XML Initial commit 2018-07-10 13:54:56 +03:00

lua534 (reimu)

Lua Web Dumper

I wrote this thing because in some period of my life i was needed in fast way to dump websites, so here it's

Introduction

Lua534 supports multithreading. It's not coroutines, it's real threads with different lua states (for 20 threads you have 20 lua states). Also providen htmlcxx and curl libraries

task table

task table stands for multitasking manipulations with threads.
setThreadCount(number) -- sets default number of threads in payload. Used when you launch 100 jobs but need 10 threads per each payload.
setDelay(msecs) -- sets delay between thread spawning
setTimeOut(msecs) -- set timeout for thread. Used when you need close thread after some period of time if it's doesn't closed yet
getGlobal(name) -- returns variable from Global Lua State. Used to get some settings or variables in Thread Lua States
lockGlobal() -- regular mutex
unlockGlobal() -- regular mutex

file table

some file utils
exists(path) -- checks if file or dir exists
mkdir(name) -- creates directory
stat(path) -- stat C function. Returns struct stat table, like in C
size(path) -- returns file size
remove(path) -- removes file or directory

u8 table

UTF-8 and UTF-16 conversion functions
print(utf16text) -- prints UTF-16 text
scan(bufsize) -- reads UTF-16 input from console
write(file,utf16text) -- writes UTF-16 text to file
conv_u16(utf16text) -- convers UTF-16 to UTF-8
conv_u8(text) -- convers UTF-8 to UTF-16

global functions

tohtml(html_text) -- returns html object

HTML's Object Metatable

contentOf(child) -- returns content of child
toTable() -- returns table of html childs
byTagName(tagname) -- returns table of tags with name tagname
getChildsOf(child) -- returns table of childs of child

global functions

curl_open() -- returns CURL handle object

CURL's Object Metatable

close() -- closes handle
setOpt(CURLOPT_,arg) -- sets curl opt. Arg can be string or number. Supported CURLOPT_*:

  • CURLOPT_URL
  • CURLOPT_PORT
  • CURLOPT_POST
  • CURLOPT_PROXY
  • CURLOPT_VERBOSE
  • CURLOPT_TIMEOUT
  • CURLOPT_USERAGENT
  • CURLOPT_POSTFIELDS
  • CURLOPT_AUTOREFERER
  • CURLOPT_REFERER
  • CURLOPT_COOKIE
  • CURLOPT_COOKIEFILE
  • CURLOPT_COOKIEJAR
  • CURLOPT_COOKIELIST
  • CURLOPT_FOLLOWLOCATION
  • CURLOPT_CONNECTTIMEOUT

getInfo(CURLINFO_) -- returns curl info. Supported CURLINFO_*:

  • CURLINFO_RESPONSE_CODE
  • CURLINFO_HTTP_CODE

global functions

performMultiTask(thread_function,args) -- Launch Lua Threads for each element in table. Notice that each thread have it own lua state, so you must access variables from Global Lua State via task.getGlobal(name). If you did dofile in Global Lua State, you must it in Lua Thread State

Example

--<a href="/img/Lena/-4quaSExzHc.jpg" target="_blank">
dofile("base.lua")
task.setThreadCount(10)

USERAGENT = "GM9 REVCOUNCIL"
URL = "https://anonymus-lenofag.github.io"
MAXERRORS = 5

curl = curl_open()
curl:setOpt(CURLOPT_USERAGENT,USERAGENT)
curl:setOpt(CURLOPT_URL,URL)
data,res,code = _performCurl(curl,MAXERRORS)
curl:close()

function download(pic)
	dofile("base.lua")
	local l,k = pic:find("Lena/")
	local fpic = io.open("lena"..pic:sub(k),"wb")
	local curl = curl_open()

	curl:setOpt(CURLOPT_URL,task.getGlobal("URL")..pic)
	curl:setOpt(CURLOPT_USERAGENT,task.getGlobal("USERAGENT"))
	local res,code = _performFileCurl(curl,fpic,task.getGlobal("MAXERRORS"))
	if res == 0 then print(pic) end
	fpic:close()
	curl:close()
end

pics = {}
prs = tohtml(data)
for k,v in pairs(prs:toTable()) do
	local href = v:attribute("href")
	local target = v:attribute("target")
	if v:tagName() == "a" and href ~= nil 
		and target ~= nil then
		pics[#pics+1] = href
	end
end

file.mkdir("lena/")
performMultiTask(download,pics)

P.S.

Also there is table of arguments args in Global Lua State.
reimu.exe myscript.lua arg1 arg2 arg3

for k,v in pairs(args) do
	print(v)
end
  • myscript.lua
  • arg1
  • arg2
  • arg3