--[===[
MODULE "MCHKLNGCODE" (check language code)
"eo.wiktionary.org/wiki/Modulo:mchklngcode" <!--2024-Aug-27-->
"id.wiktionary.org/wiki/Modul:mchklngcode"
Purpose: checks validity of 1 or 2 parameters that are supposed to
contain language code in 2 stages: by testing whether they
are obviously invalid, and if this does not apply whether
they are known
Utilo: kontrolas validecon de 1 aux 2 parametroj kiuj enhavu
lingvokodon en 2 pasxoj: testante cxu ili estas evidente
nevalidaj, kaj se tio ne veras cxu ili estas konataj
Manfaat: mengontrol validitas 1 atau 2 parameter yang seharusnya
berisi kode bahasa ...
Syfte: kontrollerar giltighet av 1 eller 2 parametrar som ska innehaalla
spraakkod ...
Used by templates / Uzata far sxablonoj:
* deveno3 elpropra Lingvo t
Required submodules / Bezonataj submoduloj / Submodul yang diperlukan:
* "loaddata-tbllingvoj" T76 in turn requiring template "tbllingvoj" (EO)
* "loaddata-tblbahasa" T76 in turn requiring template "tblbahasa" (ID)
This module is special in that it takes parameters both those sent
to itself (own frame) and those sent to the caller (caller's frame).
This module needs parameters that are different from parameters
submitted to the calling template. Self-test is still possible.
!!! BEWARE control string is taken only if exactly one of xx= and yy=
!!! is available, and has correct length, otherwise silently ignored
!!! BEWARE allow digit in middle position via xx= is removed
Incoming: - 1 or 2 anonymous parameters
- 1 or 2 parameters forwarded from the caller using "{{{1}}}"
or "{{{ling}}}" or similarly, wall "|" is not needed,
maybe "{{{ling|eo}}}" for an optional parameter,
conversely "{{{ling|}}}" is bad
- 1 optional named parameter
* "yy=" control string with 8 char:s and 7 values, (one tristate
letter, 6 boolean digits "0" or "1", and one separator "-",
pattern ".1-11111")
* (pos 0) desired type of result b t k
* (pos 1) do check 2 codes 0 1
* (pos 2) separator "-"
* (pos 3) allow "-" 0 1
* (pos 4) allow "??" 0 1
* (pos 5) allow long codes such as "zh-min-nan" 0 1
* (pos 6) allow digit in middle position of 3-letter codes 0 1
* (pos 7) skip test against ban table 0 1
* "xx=" control string with 5 char:s (one tristate letter,
3 boolean digits "0" or "1", and one fourstate digit) !!!FIXME!!! deprecated
- tri-state letter : desired type of result (default is "b"):
- "b" -- boolean (0 evil -- 1 tolerable) for conditional logic
in classic templates ("evil" is invalid,
"tolerable" is unknown or known)
- "t" -- tristate (0 invalid -- 1 unknown -- 2 known)
- "k" -- category (2 categories without EOL between them
if applicable, or empty string if known, see below)
- boolean: do check 2 codes (by default only 1 code is checked)
- fourstate digit: allow "-" or "??" (default "0")
- "0" -- do not allow any
- "1" -- allow "-"
- "2" -- allow "??"
- "3" -- allow both
- boolean: allow digit THIS IS REMOVED NOW
- boolean: do NOT disallow some common bad codes ("epo", "por",
...) by ban table (default false ie do disallow)
Parameters accepted from both own and caller's frame:
* 2 named optional hidden parameters
* "detxt=true" (dec-encode AKA nowiki-encode the output and
that way make the category insertions on error visible,
any other value is ignored)
* "nocat=true" (suppress categorization in "k" mode,
any other value is ignored, also ignored if "detxt=true"
since "detxt=" overrides "nocat=") !!!FIXME!!! deprecated
Returned: - "b" : string "1" if the parameters/codes are tolerable (known or
unknown), string "0" if the parameters/codes are evil
(obviously invalid), or this module itself becomes
victim of misuse
- "t" : string "2" if the parameters/codes are known (both known),
string "1" if the parameters/codes are unknown but not
obviously invalid, string "0" if the parameters/codes
are invalid (at least one is obviously invalid), or this
module itself becomes victim of misuse
- "k" : up to 3 categories (of 2 possible types) without EOL between
them, or empty string if the parameters are accepted,
no junk categories if this module itself becomes
victim of misuse
For "b" and "t" the principle is "the worst result counts", but for "k"
the 2 triples of categories are based on separate evaluations of the 2 codes.
The validity check for obviously invalid code
requires in order to return result "pass":
- must be 2 or 3 ASCII char:s long, and consist only of lowercase letters
(optionally, digit in the middle position or long codes can be allowed)
- must not be on the ban list (this check can optionally be deactivated)
- optionally string "-" or "??" (but not "???") can be allowed in this
stage (but still cannot be accepted later as "known")
La kontrolo pri evidente nevalida kode postulas
por redoni rezulton "tolerebla":
- longo estu 2 aux 3 ASCII signoj, kaj enhavu nur minusklajn literojn
(opcie, cifero en la meza pozicio aux longaj kodoj povas esti permesitaj)
- ne trovigxu sur la forbara listo (cxi tiu kontrolo povas opcie
esti senaktivigita)
- opcie signocxeno "-" aux "??" (sed ne "???") povas esti permesita en cxi tiu
pasxo (sed dauxre ne povas esti akceptita pli tarde kiel "konata")
Note that the operation modes "b" or "t" and on the other side "k" are
separated and it is NOT possible to merge them. Result from "b" is fed into
"#ifeq" and possible categories would be ignored. Thus this module will be
usually called several times from one template, even with same language code.
Note that the result in boolean mode "b" is either "1" "accepted" or "0" "bad"
after this module has succeeded to run. But there is a third option "module
failed to run" due to not found or timeout for example. The conditional logic
in the calling template must be aware of this.
In the category mode "k" the format of the categories is:
* obviously invalid:
* "[[Kategorio:Evidente nevalida lingvokodo]]"
* "[[Kategorio:Evidente nevalida lingvokodo nome (Deutsch)]]"
* "[[Kategorio:Evidente nevalida lingvokodo loke (deveno3)]]"
or
* uknown (unsupported by given wiki at the moment)
* "[[Kategorio:Nekonata lingvokodo]]"
* "[[Kategorio:Nekonata lingvokodo nome (haw)]]"
* "[[Kategorio:Nekonata lingvokodo loke (deveno3)]]"
The reported detail string is sanitized for both incoming langcode
and peeked template name:
* replaced with "e-m-p-t-y" if empty
* otherwise truncated to max 14 octet:s and unsafe char:s are
replaced with dot:s (safe are "0"..."9" "A"..."Z" "a"..."z"
"!" "," "-")
The code does not have to be sanitized if it is only "unknown",
but must be if it is "obviously invalid", we sanitize always.
The name of the caller ie parent ie previous page in the calling chain
(presumably a template) is peeked automatically and only the core (without
namespace prefix) is taken. Note that this is NOT the same as "{{PAGENAME}}"
returning the very last page in calling chain (usually in NS ZERO).
If two codes are tested then two separate triples of categories can be
created of same type or of different types (one invalid and one unknown).
This module allows to mostly separate cases of "obviously invalid language
code" (for example "" (empty), "...", "Deutsch", "De", "FR", "taja", ...)
from "unknown language code" (for example "haw" that is valid according to
"ISO 639-3:2007" but might lack in the list of languages on given wiki)
{{hr3}} <!-------------------------------->
* #T00 (no params, evil)
* expected result: "0" (evil)
* actual result: "{{#invoke:mchklngcode|ek}}"
::* #T01 ("eo", default binary output, only 1 code is tested)
::* expected result: "1" (tolerable)
::* actual result: "{{#invoke:mchklngcode|ek|eo}}"
* #T02 ("eo|crap", default binary output, only 1 code is tested)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|eo|crap}}"
::* #T03 ("eo|sv|id", 3 anon params, evil)
::* expected result: "0" (evil)
::* actual result: "{{#invoke:mchklngcode|ek|eo|sv|id}}"
* #T04 ("eo|xx=b0000", all 5 defaults explicitely confirmed, binary output)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|eo|xx=b0000}}"
* #T04 ("eo|yy=b0-00000", all 8 defaults explicitely confirmed, binary output)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|eo|yy=b0-00000}}"
::* #T05 ("eo|xx=b00000", parameter too long)
::* expected result: "1" (bad, parameter "xx=" ignored)
::* actual result: "{{#invoke:mchklngcode|ek|eo|xx=b00000}}"
::* #T05 ("eo|yy=b0-000000", parameter too long)
::* expected result: "1" (bad, parameter "yy=" ignored)
::* actual result: "{{#invoke:mchklngcode|ek|eo|yy=b0-000000}}"
* #T06 ("eo|xx=b2000", invalid digit "2" in boolean position)
* expected result: "0" (bad, parameter "xx=" rejected)
* actual result: "{{#invoke:mchklngcode|ek|eo|xx=b2000}}"
* #T06 ("eo|yy=b0-00200", invalid digit "2" in boolean position)
* expected result: "0" (bad, parameter "yy=" rejected)
* actual result: "{{#invoke:mchklngcode|ek|eo|yy=b0-00200}}"
::* #T07 ("eo|crap|xx=b1000", both codes are tested)
::* expected result: "0" (bad, latter code is invalid)
::* actual result: "{{#invoke:mchklngcode|ek|eo|crap|xx=b1000}}"
::* #T07 ("eo|crap|yy=b1-00000", both codes are tested)
::* expected result: "0" (bad, latter code is invalid)
::* actual result: "{{#invoke:mchklngcode|ek|eo|crap|yy=b1-00000}}"
{{hr3}} <!-------------------------------->
* #T10 ("eo|haw|xx=b1000", both codes are tested)
* expected result: "1" (good)
* actual result: "{{#invoke:mchklngcode|ek|eo|haw|xx=b1000}}"
::* #T11 ("eo|??|xx=b1000", both codes are tested, "??" prohibited)
::* expected result: "0" (bad)
::* actual result: "{{#invoke:mchklngcode|ek|eo|??|xx=b1000}}"
* #T12 ("eo|??|xx=b1200", both codes are tested, "??" allowed)
* expected result: "1" (good)
* actual result: "{{#invoke:mchklngcode|ek|eo|??|xx=b1200}}"
::* #T13 ("por|xx=b0000", binary output, "por" expl prohibited)
::* expected result: "0" (evil)
::* actual result: "{{#invoke:mchklngcode|ek|por|xx=b0000}}"
* #T14 ("por|xx=b0001", binary output, "por" allowed)
* expected result: "1" (tolerable)
* actual result: "{{#invoke:mchklngcode|ek|por|xx=b0001}}"
::* #T15 ("eo|z|xx=b1101", both codes are tested, right "z" is bad)
::* expected result: "0" (evil)
::* actual result: "{{#invoke:mchklngcode|ek|eo|z|xx=b1101}}"
* #T16 ("z|eo|xx=b1101", both codes are tested, left "z" is bad)
* expected result: "0" (evil)
* actual result: "{{#invoke:mchklngcode|ek|z|eo|xx=b1101}}"
::* #T17 ("epo|eo|xx=b1101", both codes are tested, "epo" allowed)
::* expected result: "1" (tolerable)
::* actual result: "{{#invoke:mchklngcode|ek|epo|eo|xx=b1101}}"
{{hr3}} <!-------------------------------->
* #T20 ("id||xx=b1101", both codes are tested, empty param is bad)
* expected result: "0" (bad)
* actual result: "{{#invoke:mchklngcode|ek|id||xx=b1101}}"
::* #T21 ("id||xx=b0101", only one code is tested, empty param is bad but ignored)
::* expected result: "1" (good)
::* actual result: "{{#invoke:mchklngcode|ek|id||xx=b0101}}"
* #T22 ("|id|xx=b0101", only one code is tested, empty early param is bad)
* expected result: "0" (bad)
* actual result: "{{#invoke:mchklngcode|ek||id|xx=b0101}}"
::* #T23 ("t8i|xx=b0000", digits prohibited as default)
::* expected result: "0" (bad)
::* actual result: "{{#invoke:mchklngcode|ek|t8i|xx=b0000}}"
* #T24 ("t8i|xx=b0010", digits permitted)
* expected result: "1" (good)
* actual result: "{{#invoke:mchklngcode|ek|t8i|xx=b0010}}"
{{hr3}} <!-------------------------------->
* #T30 ("grc|xx=t0000", tristate)
* expected result: "2" (good and known)
* actual result: "{{#invoke:mchklngcode|ek|grc|xx=t0000}}"
::* #T31 ("t8i|xx=t0010", tristate, digits permitted)
::* expected result: "2" (good and known) or "1" (valid but unknown)
::* actual result: "{{#invoke:mchklngcode|ek|t8i|xx=t0010}}"
* #T32 ("??|xx=t0200", tristate, "??" is allowed)
* expected result: "1" (valid but unknown)
* actual result: "{{#invoke:mchklngcode|ek|??|xx=t0200}}"
::* #T33 ("???|xx=t0200", tristate, "??" is allowed but "???" is NOT)
::* expected result: "0" (obviously invalid)
::* actual result: "{{#invoke:mchklngcode|ek|???|xx=t0200}}"
* #T34 ("fra|xx=t0000", tristate, this code is expl banned)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|fra|xx=t0000}}"
::* #T35 ("fra|xx=t0001", tristate, this code is expl banned but we do not care)
::* expected result: "1" (valid but unknown)
::* actual result: "{{#invoke:mchklngcode|ek|fra|xx=t0001}}"
{{hr3}} <!-------------------------------->
* #T40 ("f3i|xx=t0000", tristate, digits prohibited by default)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|f3i|xx=t0000}}"
::* #T41 ("f3i|xx=t0010", tristate, digits permitted)
::* expected result: "1" (valid but unknown)
::* actual result: "{{#invoke:mchklngcode|ek|f3i|xx=t0010}}"
* #42 ("fi3|xx=t0010", tristate, digits permitted but only in middle position)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|fi3|xx=t0010}}"
* #43 ("3fi|xx=t0010", tristate, digits permitted but only in middle position)
* expected result: "0" (obviously invalid)
* actual result: "{{#invoke:mchklngcode|ek|3fi|xx=t0010}}"
{{hr3}} <!-------------------------------->
* #50 ("grc|xx=k0000", 4 defaults explicitely confirmed, category mode)
* expected result: "" (empty string, good)
* actual result: "{{#invoke:mchklngcode|ek|grc|xx=k0000}}"
* #51 ("fri|xx=k0000|detxt=true", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (valid but unknown, categories)
* actual result: "{{#invoke:mchklngcode|ek|fri|xx=k0000|detxt=true}}"
* #52 ("fori|xx=k0000|detxt=true", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (obviously invalid, categories)
* actual result: "{{#invoke:mchklngcode|ek|fori|xx=k0000|detxt=true}}"
<pre>
* #T53 ("fri|xx=k0000", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (valid but unknown, categories)
* actual result: "{{#invoke:mchklngcode|ek|fri|xx=k0000}}"
* #T54 ("fori|xx=k0000", 4 defaults explicitely confirmed, category mode)
* expected result: N/A (obviously invalid, categories)
* actual result: "{{#invoke:mchklngcode|ek|fori|xx=k0000}}"
</pre>
* note that tests #T20 ... #T22 use empty parameters
* note that tests #T53 and #T54 cannot be executed on the docs subpage
{{hr3}} <!-------------------------------->
]===]
local exporttable = {}
------------------------------------------------------------------------
---- CONSTANTS [O] ----
------------------------------------------------------------------------
-- uncommentable EO vs ID constant strings (core site-related features, "constrpriv" NOT needed)
local constringvoj = "Modulo:loaddata-tbllingvoj" -- EO
-- local constringvoj = "Modul:loaddata-tblbahasa" -- ID
local constrneva = "Kategorio:Evidente nevalida lingvokodo" -- EO -- no brackets ("[[","]]") here
-- local constrneva = "Kategori:Kode bahasa jelas-jelas tidak valid" -- ID -- no brackets ("[[","]]") here
local constrneko = "Kategorio:Nekonata lingvokodo" -- EO -- no brackets ("[[","]]") here
-- local constrneko = "Kategori:Kode bahasa tidak diketahui" -- ID -- no brackets ("[[","]]") here
-- constant table -- ban list -- add obviously invalid access codes (2-letter or 3-letter) only
-- length of the list is NOT stored anywhere, the processing stops
-- when type "nil" is encountered, used by "lfivalidatelnkoadv" only
-- controversial codes (sh sr hr), (zh cmn)
-- "en.wiktionary.org/wiki/Wiktionary:Language_treatment" excluded languages
-- "en.wikipedia.org/wiki/Spurious_languages"
-- "iso639-3.sil.org/code/art" only valid in ISO 639-2
-- "iso639-3.sil.org/code/gem" only valid in ISO 639-2 and 639-5, "collective"
-- "iso639-3.sil.org/code/zxx" "No linguistic content"
local contabisbanned = {}
contabisbanned = {'by','dc','ll','jp','art','deu','eng','epo','fra','gem','ger','ido','lat','por','rus','spa','swe','tup','zxx'} -- 1...19
-- emergency brake (6 binary digits: nevagene,nevanome,nevaloke,nekogene,nekonome,nekoloke)
local constrfilter = "111111" -- change one or several digits to ZERO to prevent categorization
------------------------------------------------------------------------
---- SPECIAL STUFF OUTSIDE MAIN [B] ----
------------------------------------------------------------------------
---- SPECIAL VAR:S ----
local qldingvoj = {} -- type "table" and nested
local qbooguard = false -- only for the guard test, pass to other var ASAP
---- GUARD AGAINST INTERNAL ERROR AND IMPORT ONE VIA LOADDATA ----
qbooguard = (type(constringvoj)~='string') or (type(constrneva)~='string') or (type(constrneko)~='string')
if (not qbooguard) then
qbooguard = (constringvoj=='') or (constrneva=='') or (constrneko=='')
end--if
if (not qbooguard) then
qldingvoj = mw.loadData(constringvoj) -- can crash here
qbooguard = (type(qldingvoj)~='table') -- seems to be always false
end--if
------------------------------------------------------------------------
---- LOW LEVEL STRING FUNCTIONS [G] ----
------------------------------------------------------------------------
-- Local function LFGSTRINGRANGE
local function lfgstringrange (varvictim, nummini, nummaxi)
local nummylengthofstr = 0
local booveryvalid = false -- preASSume guilt
if (type(varvictim)=='string') then
nummylengthofstr = string.len(varvictim)
booveryvalid = ((nummylengthofstr>=nummini) and (nummylengthofstr<=nummaxi))
end--if
return booveryvalid
end--function lfgstringrange
------------------------------------------------------------------------
-- test whether char is an ASCII digit "0"..."9", return boolean
local function lfgtestnum (numkaad)
local boodigit = false
boodigit = ((numkaad>=48) and (numkaad<=57))
return boodigit
end--function lfgtestnum
------------------------------------------------------------------------
-- test whether char is an ASCII uppercase letter, return boolean
local function lfgtestuc (numkode)
local booupperc = false
booupperc = ((numkode>=65) and (numkode<=90))
return booupperc
end--function lfgtestuc
------------------------------------------------------------------------
-- test whether char is an ASCII lowercase letter, return boolean
local function lfgtestlc (numcode)
local boolowerc = false
boolowerc = ((numcode>=97) and (numcode<=122))
return boolowerc
end--function lfgtestlc
------------------------------------------------------------------------
-- Local function LFGIS62SAFE
-- Test whether incoming ASCII char is very safe (0...9 A...Z a...z).
-- Depends on functions :
-- [G] lfgtestnum lfgtestuc lfgtestlc
local function lfgis62safe (numcxair)
local booguud = false
booguud = lfgtestnum (numcxair) or lfgtestuc (numcxair) or lfgtestlc (numcxair)
return booguud
end--function lfgis62safe
------------------------------------------------------------------------
---- HIGH LEVEL STRING FUNCTIONS [I] ----
------------------------------------------------------------------------
-- Local function LFIFIXUNSAFE
-- Fix dangerous string (obviously invalid language code or whatever) so that
-- it can at least be reported (used inside name of a tracking category).
-- Input : * strbahaya
-- Output : * strfixed
-- Depends on functions :
-- [G] lfgtestnum lfgtestuc lfgtestlc lfgis62safe
-- # empty string replaced with "e-m-p-t-y"
-- # truncated to 14 octet:s if longer
-- # unsafe char:s are replaced with dot:s (safe are only "0"..."9" and
-- "A"..."Z" and "a"..."z" and "!" and "," and "-" and maybe ".")
local function lfifixunsafe (strbahaya)
local strfixed = ""
local numlencx = 0
local numcxaar = 0
local numuindex = 1 -- ONE-based
local boogood = false
if (strbahaya=="") then
strfixed = "e-m-p-t-y"
else
numlencx = math.min (string.len (strbahaya), 14)
while true do
if (numuindex>numlencx) then
break
end--if
numcxaar = string.byte (strbahaya,numuindex,numuindex)
boogood = lfgis62safe (numcxaar) -- 0...9 A...Z a...z
if (numcxaar==33) then
boogood = true -- "!"
end--if
if ((numcxaar>=44) and (numcxaar<=46)) then
boogood = true -- ",-" -- FYI: 46 is the dot "."
end--if
if (not boogood) then
numcxaar = 46 -- replace by dot "."
end--if
strfixed = strfixed .. string.char (numcxaar)
numuindex = numuindex + 1
end--while
end--if
return strfixed
end--function lfifixunsafe
------------------------------------------------------------------------
-- Local function LFIDECENCODMIN
-- Minimally encode char:s to prevent parsing. Our cool module has brewed
-- something with "[["..."]]" but we want to see plain text for debugging
-- purposes. This is the most dumb version that dec-encodes all ASCII and
-- does not expect esoteric ASCII values or broken UTF8 stream.
-- Input : * strkrampdang -- string, empty tolerable, but type "nil" is NOT
-- Output : * strmincod -- string, empty in worst case
local function lfidecencodmin (strkrampdang)
local strmincod = ''
local numstrlen = 0
local numpeekinx = 1 -- ONE-based index
local numchmiar = 0
numstrlen = string.len (strkrampdang)
while true do
if (numpeekinx>numstrlen) then
break
end--if
numchmiar = string.byte (strkrampdang,numpeekinx,numpeekinx)
numpeekinx = numpeekinx + 1
if (numchmiar>127) then
strmincod = strmincod .. string.char (numchmiar) -- pass UTF8
else
strmincod = strmincod .. '&#' .. tostring (numchmiar) .. ';' -- encode ASCII
end--if
end--while
return strmincod
end--function lfidecencodmin
------------------------------------------------------------------------
-- Local function LFIVALIDATELNKOADV
-- Advanced test whether a string (intended to be a langcode) is valid
-- containing only 2 or 3 lowercase letters, or 2...10 char:s and with some
-- dashes, or maybe a digit in middle position or maybe instead equals to "-"
-- or "??" and maybe additionally is not included on the ban list.
-- Input : * strqooq -- string (empty is useless and returns
-- "true" ie "bad" but cannot cause any major harm)
-- * booyesdsh -- "true" to allow special code dash "-"
-- * booyesqst -- "true" to allow special code doublequest "??"
-- * booloonkg -- "true" to allow long codes such as "zh-min-nan"
-- * boodigit -- "true" to allow digit in middle position
-- * boonoban -- (inverted) "true" to skip test against ban table
-- Output : * booisvaladv -- true if string is valid
-- Depends on functions :
-- [G] lfgtestnum lfgtestlc
-- Depends on constants :
-- * table "contabisbanned"
-- Incoming empty string is safe but type "nil" is NOT.
-- Digit is tolerable only ("and" applies):
-- * if boodigit is "true"
-- * if length is 3 char:s
-- * in middle position
-- Dashes are tolerable (except in special code "-") only ("and" applies):
-- * if length is at least 4 char:s (if this is permitted at all)
-- * in inner positions
-- * NOT adjacent
-- * maximally TWO totally
-- There may be maximally 3 adjacent letters, this makes at least ONE dash
-- obligatory for length 4...7, and TWO dashes for length 8...10.
local function lfivalidatelnkoadv (strqooq, booyesdsh, booyesqst, booloonkg, boodigit, boonoban)
local varomongkosong = 0 -- for check against the ban list
local numchiiar = 0
local numukurran = 0
local numindeex = 0 -- ZERO-based -- two loops
local numadjlet = 0 -- number of adjacent letters (max 3)
local numadjdsh = 0 -- number of adjacent dashes (max 1)
local numtotdsh = 0 -- total number of dashes (max 2)
local booislclc = false
local booisdigi = false
local booisdash = false
local booisvaladv = true -- preASSume innocence -- later final verdict here
while true do -- fake (outer) loop
if (strqooq=='-') then
booisvaladv = booyesdsh
break -- to join mark -- good or bad
end--if
if (strqooq=='??') then
booisvaladv = booyesqst
break -- to join mark -- good or bad
end--if
numukurran = string.len (strqooq)
if ((numukurran<2) or (numukurran>10)) then
booisvaladv = false
break -- to join mark -- evil
end--if
if (not booloonkg and (numukurran>3)) then
booisvaladv = false
break -- to join mark -- evil
end--if
numindeex = 0
while true do -- inner genuine loop over char:s
if (numindeex>=numukurran) then
break -- done -- good
end--if
numchiiar = string.byte (strqooq,(numindeex+1),(numindeex+1))
booisdash = (numchiiar==45)
booisdigi = lfgtestnum(numchiiar)
booislclc = lfgtestlc(numchiiar)
if (not (booislclc or booisdigi or booisdash)) then
booisvaladv = false
break -- to join mark -- inherently bad char
end--if
if (booislclc) then
numadjlet = numadjlet + 1
else
numadjlet = 0
end--if
if (booisdigi and ((numukurran~=3) or (numindeex~=1) or (not boodigit))) then
booisvaladv = false
break -- to join mark -- illegal digit
end--if
if (booisdash) then
if ((numukurran<4) or (numindeex==0) or ((numindeex+1)==numukurran)) then
booisvaladv = false
break -- to join mark -- illegal dash
end--if
numadjdsh = numadjdsh + 1
numtotdsh = numtotdsh + 1 -- total
else
numadjdsh = 0 -- do NOT zeroize the total !!!
end--if
if ((numadjlet>3) or (numadjdsh>1) or (numtotdsh>2)) then
booisvaladv = false
break -- to join mark -- evil
end--if
numindeex = numindeex + 1 -- ZERO-based
end--while -- inner genuine loop over char:s
if (not boonoban) then -- if "yesban" then
numindeex = 0
while true do -- lower inner genuine loop
varomongkosong = contabisbanned[numindeex+1] -- number of elem unknown
if (type(varomongkosong)~='string') then
break -- abort inner loop (then outer fake loop) due to end of table
end--if
numukurran = string.len (varomongkosong)
if ((numukurran<2) or (numukurran>3)) then
break -- abort inner loop (then outer fake loop) due to faulty table
end--if
if (strqooq==varomongkosong) then
booisvaladv = false
break -- abort inner loop (then outer fake loop) due to violation
end--if
numindeex = numindeex + 1 -- ZERO-based
end--while -- lower inner genuine loop
end--if (not boonoban) then
break -- finally to join mark
end--while -- fake loop -- join mark
return booisvaladv
end--function lfivalidatelnkoadv
------------------------------------------------------------------------
---- HIGH LEVEL FUNCTIONS [H] ----
------------------------------------------------------------------------
-- Local function LFBREW3KAT
-- Brew 3 categories from the bad langcode (generic and specific nome
-- and specific loke).
-- Input : * strkatbase (kategory base name with namespace prefix)
-- * strspecnome (the bad langcode already sanitized)
-- * strspecloke (the caller name already sanitized)
-- * booxgene, booxnome, booxloke
-- Output : * strtigakucing (can be empty)
local function lfbrew3kat (strkatbase, strspecnome, strspecloke, booxgene, booxnome, booxloke)
local strtigakucing = ''
if (booxgene) then
strtigakucing = '[[' .. strkatbase .. ']]'
end--if
if (booxnome) then
strtigakucing = strtigakucing .. '[[' .. strkatbase .. ' nome (' .. strspecnome .. ')]]'
end--if
if (booxloke) then
strtigakucing = strtigakucing .. '[[' .. strkatbase .. ' loke (' .. strspecloke .. ')]]'
end--if
return strtigakucing
end--function lfbrew3kat
------------------------------------------------------------------------
---- VARIABLES [R] ----
------------------------------------------------------------------------
function exporttable.ek (arxframent)
-- general unknown type
local vartmp = 0 -- variable without type
-- special type "args" AKA "arx"
local arxourown = 0 -- metaized "args" from our own "frame"
local arxcaller = 0 -- metaized "args" from caller's "frame"
-- general "tab"
local tablg76yleft = {}
-- general "str"
local strkodo3 = "" -- code (obligatory)
local strkodo4 = "" -- code (optional)
local strpncalco = "" -- pagename core of the caller
local strxx = "" -- DEPRECATED 5 char:s
local stryy = "" -- new 8 char:s
local strret = "" -- output string
-- general "num"
local numbtkmo = 98 -- operation mode / type of result: "b" or "t" or "k"
local num012st3k = 2 -- tri-state sta "strkodo3"
local num012st4k = 2 -- tri-state sta "strkodo4" (remains 2 if only 1 code)
local num012stzz = 2 -- tri-state combo = min (num012st3k,num012st4k)
local numlong = 0 -- temp
local numchar = 0 -- temp
local numbull = 0 -- temp for peeking caller
local numposcol = 0 -- temp for peeking caller ONE-based position of colon
-- general "boo"
local boochktwo = false
local boodashgd = false -- allow "-"
local boodblqgd = false -- allow "??"
local boolonggd = false -- allow long codes such as "zh-min-nan"
local boodigigd = false -- allow digit in middle position
local booskipbt = false -- (inverted) skip test against ban table
local boonocat = false -- from "nocat=true"
local boodetxt = false -- from "detxt=true"
local boointer = false -- "true" on internal error (blocks categorization)
local boodoccek = false -- temp: do the check at all (maybe can be skipped)
local boonevagene = true -- @ for "constrfilter", default is "true"
local boonevanome = true -- @
local boonevaloke = true -- @
local boonekogene = true -- @
local boonekonome = true -- @
local boonekoloke = true -- @
------------------------------------------------------------------------
---- MAIN [Z] ----
------------------------------------------------------------------------
---- GUARD AGAINST INTERNAL ERROR ----
boointer = qbooguard
---- PICK ONE SUBTABLE ----
while true do -- fake loop
if (boointer) then
break -- to join mark
end--if
num2statcode = qldingvoj[2] -- risk of type "nil"
if (num2statcode~=0) then
boointer = true -- #E02 malica
break -- to join mark
end--if
tablg76yleft = qldingvoj['T76']
if (type(tablg76yleft)~='table') then -- important check
boointer = true -- #E02 malica
break -- to join mark
end--if
break -- finally to join mark
end--while -- fake loop -- join mark
---- SEIZE CALLER'S NAME FROM MW (ONLY CORE NEEDED, NO PREFIX) ----
-- assigns "strpncalco" (pagename core) at least one char
-- a posible failure here is NOT fatal
vartmp = arxframent:getParent():getTitle()
numposcol = 0
strpncalco = ''
if (type(vartmp)=="string") then
strtmp = vartmp
numbull = string.len (strtmp)
if (numbull>2) then
vartmp = string.find (strtmp, ':', 1, true) -- plain text search
if (vartmp~=nil) then -- "not found" is NOT valid
numposcol = vartmp -- ONE-based position
if ((numposcol==1) or (numposcol==numbull)) then
numposcol = 0 -- invalid position of colon
end--if
end--if
end--if (numbull>2) then
end--if
if (numposcol~=0) then
strpncalco = string.sub (strtmp,(numposcol+1),numbull) -- remove prefix
end--if
---- GET THE ARX:ES ----
if (not boointer) then
arxourown = arxframent.args -- "args" from our own "frame"
arxcaller = arxframent:getParent().args -- "args" from caller's "frame"
end--if
---- SEIZE 1 OR 2 ANONYMOUS PARAMETERS AND "XX=" "YY=" SENT BY CALLER TO US ----
while true do -- fake loop
if (boointer) then
break -- to join mark
end--if
if (arxourown[3]) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark -- 3 anon params are not appreciated
end--if
vartmp = arxourown[1] -- can be "nil"
if (type(vartmp)=="string") then
strkodo3 = vartmp -- give a f**k about risk of empty string
end--if
vartmp = arxourown[2] -- can be "nil"
if (type(vartmp)=="string") then
strkodo4 = vartmp -- give a f**k about risk of empty string
end--if
vartmp = arxourown['xx'] -- can be "nil" -- optional named !!!FIXME!!! deprecated
if (lfgstringrange(vartmp,5,5)) then
strxx = vartmp
end--if
vartmp = arxourown['yy'] -- can be "nil" -- optional named
if (lfgstringrange(vartmp,8,8)) then
stryy = vartmp
end--if
break -- finally to join mark
end--while -- fake loop -- join mark
while true do -- fake loop !!!FIXME!!! use LFIVALIUMDCTLSTR
if (boointer) then
break -- to join mark
end--if
if ((strxx~='') and (stryy=='')) then -- !!!FIXME!!! DEPRECATED and allow digit sudah removed
numchar = string.byte (strxx,1,1) -- enum "b" (default) or "t" "k"
if (numchar==116) then
numbtkmo = 116 -- requested "t" -- was preassigned to 98 ie "b"
else
if (numchar==107) then
numbtkmo = 107 -- requested "k" -- was preassigned to 98 ie "b"
else
if (numchar~=98) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if
end--if (numchar==116) else
numchar = string.byte (strxx,2,2)
if (numchar==49) then
boochktwo = true -- was preassigned to "false" -- check 2 codes
else
if (numchar~=48) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if (numchar==49) else
numchar = string.byte (strxx,3,3) -- fourstate "0" ... "3"
if ((numchar<48) or (numchar>51)) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
boodashgd = ((numchar==49) or (numchar==51)) -- allow "-"
boodblqgd = ((numchar==50) or (numchar==51)) -- allow "??"
numchar = string.byte (strxx,5,5)
if (numchar==49) then
booskipbt = true -- was preassigned to "false" -- skip extra ban test
else
if (numchar~=48) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if (numchar==49) else
end--if
if ((strxx=='') and (stryy~='')) then
numchar = string.byte (stryy,1,1) -- enum "b" (default) or "t" "k"
if (numchar==116) then
numbtkmo = 116 -- requested "t" -- was preassigned to 98 ie "b"
else
if (numchar==107) then
numbtkmo = 107 -- requested "k" -- was preassigned to 98 ie "b"
else
if (numchar~=98) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if
end--if (numchar==116) else
numchar = string.byte (stryy,2,2) -- boolean
if (numchar==49) then
boochktwo = true -- was preassigned to "false" -- check 2 codes
else
if (numchar~=48) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if (numchar==49) else
numchar = string.byte (stryy,4,4) -- boolean
if ((numchar<48) or (numchar>49)) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
boodashgd = (numchar==49) -- allow "-"
numchar = string.byte (stryy,5,5) -- boolean
if ((numchar<48) or (numchar>49)) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
boodblqgd = (numchar==49) -- allow "??"
numchar = string.byte (stryy,6,6) -- boolean
if (numchar==49) then
boolonggd = true -- was preassigned to "false" -- allow long codes
else
if (numchar~=48) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if (numchar==49) else
numchar = string.byte (stryy,7,7) -- boolean
if (numchar==49) then
boodigigd = true -- was preassigned to "false" -- allow digit
else
if (numchar~=48) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if (numchar==49) else
numchar = string.byte (stryy,8,8) -- boolean
if (numchar==49) then
booskipbt = true -- was preassigned to "false" -- skip extra ban test
else
if (numchar~=48) then
boointer = true -- internal error -- was preassigned to "false"
break -- to join mark
end--if
end--if (numchar==49) else
end--if
break -- finally to join mark
end--while -- fake loop -- join mark
---- SEIZE 2 OPTIONAL NAMED PARAM SENT BY SOMEONE TO US OR TO CALLER ----
-- "detxt" overrides "nocat"
if (not boointer) then
vartmp = arxourown['detxt'] -- can be "nil"
boodetxt = (vartmp=='true')
vartmp = arxcaller['detxt'] -- can be "nil"
if (type(vartmp)=='string') then -- override only if text given
boodetxt = (vartmp=='true')
end--if
end--if
if ((boointer==false) and (boodetxt==false)) then
vartmp = arxourown['nocat'] -- can be "nil"
boonocat = (vartmp=='true')
vartmp = arxcaller['nocat'] -- can be "nil"
if (type(vartmp)=='string') then -- override only if text given
boonocat = (vartmp=='true')
end--if
end--if
---- CARRY OUT THE HARD WORK -- TEST FOR OBVIOUS INVALIDITY ----
-- this hard work is NOT needed if:
-- # we already have an internal error
-- or
-- # result mode is "k" category and we have "nocat=true"
boodoccek = true -- preASSume
if (boointer) then
boodoccek = false
end--if
if ((numbtkmo==107) and boonocat) then -- "k" and "nocat=true"
boodoccek = false
end--if
if (boodoccek) then
if (not lfivalidatelnkoadv(strkodo3,boodashgd,boodblqgd,boolonggd,boodigigd,booskipbt)) then
num012st3k = 0
end--if
end--if
if (boodoccek and boochktwo) then
if (not lfivalidatelnkoadv(strkodo4,boodashgd,boodblqgd,boolonggd,boodigigd,booskipbt)) then
num012st4k = 0
end--if
end--if
---- CHECK WHETHER THE CODES ARE KNOWN IE SUPPORTED ----
-- this hard work is NOT needed if:
-- # we already have an internal error
-- or
-- # result mode is "b" binary (then we do not distinguish "1" from "2")
-- or
-- # result mode is "k" category and we have "nocat=true"
boodoccek = true
if (boointer) then
boodoccek = false
end--if
if (numbtkmo==98) then -- "b" boolean / binary mode
boodoccek = false
end--if
if ((numbtkmo==107) and boonocat) then -- "k" and "nocat=true"
boodoccek = false
end--if
if ((num012st3k==2) and boodoccek) then -- 2 means known
if (type(tablg76yleft[strkodo3])~='string') then
num012st3k = 1 -- degrade to 1 unknown
end--if
end--if
if ((num012st4k==2) and boodoccek and boochktwo) then -- 2 means known
if (type(tablg76yleft[strkodo4])~='string') then
num012st4k = 1 -- degrade to 1 unknown
end--if
end--if
---- BREW MIN ----
-- we have 2 separate tristate results "num012st3k" and "num012st4k"
-- "num012st4k" was preassigned to 2 and remains 2 all the time if
-- only one code is tested
-- an internal error ie "boointer" = "true" results in "num012stzz"
-- assigned to ZERO
-- combo result num012stzz = min (num012st3k,num012st4k)
if (boointer) then
num012stzz = 0 -- jaevlar
else
num012stzz = math.min (num012st3k, num012st4k)
end--if
---- ASSIGN "STRRET" TO BOOLEAN OR TRISTATE ----
-- possible modes in "numbtkmo" are 98 "b" (default) or 116 "t" or 107 "k"
if (numbtkmo==116) then -- "t" tristate
strret = string.char(num012stzz+48) -- this was rocket science
end--if
if (numbtkmo==98) then -- "b" boolean
strret = "1" -- preASSume innocence -- report "tolerable" (was 1 or 2)
if (num012stzz==0) then
strret = "0" -- report "evil" (0)
end--if
end--if
---- BREW CATEGORIES ----
-- here we process the "k" mode
-- we use the 2 separate tristate results "num012st3k" and "num012st4k"
-- 0 invalid -- 1 unknown -- 2 known
-- the strings are "constrneva" and "constrneko" and include the category
-- prefix and the name, but not "[[","]]" nor space before details
-- ("nome", "loke", "(", ")")
-- for obviously invalid codes we use "constrneva" and
-- include the bad string (sanitized)
-- for unknown codes we use "constrneko" and include the bad string
-- categorization can be blocked by several conditions:
-- # totally by "num012stzz" = 2 (code is or both codes are
-- known, no need to whine)
-- # totally by "nocat=true"
-- # totally by "boointer=true" (junk categories on
-- internal error are NOT appreciated)
-- # selectively by "num012st.." = 2 (code is known, no need to whine)
-- # selectively by ZERO:s in "constrfilter", length must be
-- 6 char:s, default for those 6 boolean values is true
-- "strret" was preassigned to empty "" and is so
-- far untouched in the "k" mode
if ((num012stzz~=2) and (numbtkmo==107) and (boointer==false) and (boonocat==false)) then
if (string.len(constrfilter)==6) then
if (string.byte(constrfilter,1,1)==48) then
boonevagene = false
end--if
if (string.byte(constrfilter,2,2)==48) then
boonevanome = false
end--if
if (string.byte(constrfilter,3,3)==48) then
boonevaloke = false
end--if
if (string.byte(constrfilter,4,4)==48) then
boonekogene = false
end--if
if (string.byte(constrfilter,5,5)==48) then
boonekonome = false
end--if
if (string.byte(constrfilter,6,6)==48) then
boonekoloke = false
end--if
end--if
strkodo3 = lfifixunsafe (strkodo3) -- sani lngcode (redu for num012st3k==1)
strpncalco = lfifixunsafe (strpncalco) -- sanitize caller
if (num012st3k==0) then -- 2 possib in 1 of "num012st3k" & "num012st4k"
strret = strret .. lfbrew3kat (constrneva, strkodo3, strpncalco, boonevagene, boonevanome, boonevaloke)
end--if
if (num012st3k==1) then
strret = strret .. lfbrew3kat (constrneko, strkodo3, strpncalco, boonekogene, boonekonome, boonekoloke)
end--if
if (boochktwo) then
strkodo4 = lfifixunsafe (strkodo4) -- sani lngcode (redu for num012st4k==1)
if (num012st4k==0) then -- 2 possib in 1 of "num012st3k" & "num012st4k"
strret = strret .. lfbrew3kat (constrneva, strkodo4, strpncalco, boonevagene, boonevanome, boonevaloke)
end--if
if (num012st4k==1) then
strret = strret .. lfbrew3kat (constrneko, strkodo4, strpncalco, boonekogene, boonekonome, boonekoloke)
end--if
end--if
if (boodetxt) then
strret = lfidecencodmin (strret)
end--if
end--if ((num012stzz~=2) ... (boonocat==false)) then
---- RETURN THE JUNK STRING ----
return strret -- can vary depending on result type, even be empty
end--function
---- RETURN THE JUNK LUA TABLE ----
return exporttable