Modulo:msplitter

StopsignIndonesia.png Ĉi tiu modulo estas multfoje bindita.
Se vi konas la eblajn sekvojn, tiam vi povas zorgeme ekredakti.
Se vi ne kuraĝas redakti tiam vi povas proponi la deziratan ŝanĝon en la diskutejo.
Jen la dokumentaĵa subpaĝo (sen utila enhavo).
Aparta memtesto ne disponeblas. Vidu paĝon Modulo:mlawc/dokumentado por rilata memtesto.
Ĉi tiu modulo servas al ((mlawc)) kaj {{lili}} kaj sekvas la specifikaĵon.

--[===[

MODULE "MSPLITTER"

"eo.wiktionary.org/wiki/Modulo:msplitter" <!--2021-Mar-11-->
"id.wiktionary.org/wiki/Modul:msplitter"

Purpose: submodule for "mlawc"

Utilo: submodulo por "mlawc"

Manfaat: submodul untuk "mlawc"

Syfte: submodul foer "mlawc"

Used by templates / Uzata far sxablonoj / Digunakan oleh templat:
- none (this module cannot be called from a template)

Required submodules / Bezonataj submoduloj / Submodul yang diperlukan:
- none

Incoming: - single table with following content (everything must be
            prevalidated by the caller):
            -  0 (boo) -- desirability of compound cat:s
            -  1 (str) -- pagename AKA input lemma (may NOT be empty)
            -  2 (num) -- split strategy (0...5 or 7)
            -  3 (tab) -- fragments from "%"-syntax assi
            -  4 (tab) -- fragments from "#"-syntax assi
            -  5 (tab) -- fragments for manual split
            -  6 (tab) -- fragments from extra parameter
            -  7 (boo) -- true if extra parameter was used
            -  8 (tab) -- lng stuff with double-letter indexes
            -  9 (boo) -- NR word class
            - 10 (boo) -- KA word class

- desirability of compound cat:s -- index 0 (we split even if
                                             false but no cat:s then)
- lemma (may NOT be empty) -- index 1
- split control parameter -- index 2 3 4 5
- extra parameter -- index 6 7
- language stuff (code and some variants of language name) -- index 8
- word class (reduced to 2 questions) -- index 9 10

Returned: - single table with following content:
            -  0...17 (str) category names
            - 20...37 (nil or boo) main page flags
            - 40      (str) output lemma wikitext or "//" on error
            - 41      (str) debug "qstrtrace"

The split strategies available are:
- #S0 automatic multiword split
- #S1 assisted split
- #S2 manual split
- #S3 simple root split
- #S4 simple bare root
- #S5 large letter split
- #S6 reserved
- #S7 no split (splitter still may be called and extra parameter is processed)

List of 6+1+1+1 selectable morpheme types:

C  circumfix           cirkumfikso
I  infix               infikso (EO: -o- -et- -il- ...)
M  standalone root     memstara radiko (EO: tri dek post ...)
N  nonstandalone root  nememstara radiko (EO: fer voj ...)
P  prefix              prefikso
U  suffix              sufikso (postfikso, finajxo, EO: -a -j -n)
-------
W  word                vorto
-------
L  same as "N" but changes linking behavior (only in F210)
-------
X  only after "&" in the extra parameter (caller converts it for us)

These mortyp:s can be used in the split control parameter before colon ":"
with manual split, and in the extra parameter, but then "L" is prohibited
(thus C I M N P U W are left plus maybe X), either after "&", or in fragments
before ":" or "!" (see "spec-splitter-en.txt" for syntax details).

We put only the letter symbol into the category name (except for the type
word) as it otherwise would become unreasonably long. It must contain
3 pieces of information:
- language (consider "-an" in SV and ID)
- mortyp (consider "-an" and "an-" and "an" in SV)
- the morpheme / affix / word itself

It is possible to deactivate (semi-hardcoded configuration in the source
code of "mlawc") only compound categories, or the splitter resulting in the
raw lemma showed without link, or deactivate showing the lemma altogether,
in both latter cases the splitter is inactive and this module is not called
at all.

The automatic splitter ("numsplyt" = 0 and "lfsplitaa") is fully
automatic and the 2 tables at index 3 and 4 must be empty then.
No error can occur here, but there is risk for a failure that no split
boundaries can be applied, and the output is identical to the input.

The assisted splitter ("numsplyt" = 1 and "lfsplitaa") is
controlled by 2 prevalidated tables.
* Table contains up to 16 values indexed by integers 0 to 15,
  value type string "1" means do block, type "nil" means do not
  block (the default). Other values should not occur and evaluate to
  do not block like "nil" does.
* Table contains up to 16 values indexed by integers 0 to 15, value:
  * type string:
    * "N" or "I" or "A" (as described in "spec-splitter-en.txt")
    * colon ":" followed by the link target (length 1...40 octet:s NOT
      checked anymore here)
    Beginning char other than "N" or "I" or "A" or ":" should not
    occur and evaluates to do nothing unusual like "nil" does.
  * type "nil" means do nothing unusual (the default)
No error can occur in the assisted splitter, but there is risk
for a failure that no split boundaries can be applied, and the output is
identical to the input.

The manual splitter ("numsplyt" = 2 and "lfsplitmn") is controlled by one
prevalidated table, the pagename does not even enter the split process,
but a bool revealing whether it contains at least one space does.
* Table contains 1 to 16 strings indexed by integers 0 to 15,
  one string for every fragment. The 5 legal types are:
  * F000 : no brackets, no colon, no slash (visible text no link)
  * F200 : 2 brackets, no colon, no slash (combo target visible text)
  * F201 : 2 brackets, no colon, 1 slash (target / visible text)
  * F210 : 2 brackets, 1 colon, no slash (mortyp : combo target visible text)
  * F211 : 2 brackets, 1 colon, 1 slash (mortyp : target / visible text)
No error can occur in the manual splitter and no failure due to
lack of boundaries either, the "sum check" is part of the prevalidation.
Note that we use slashes and single rectangular brackets "+[I:bug/BUG]"
instead of wikisyntax "[[bug|BUG]]", beware that "[bug|BUG]" would NOT work.

]===]

local splitter = {}

------------------------------------------------------------------------

---- CONSTANTS ----

------------------------------------------------------------------------

  -- uncommentable EO vs ID constant table (categories)

  -- syntax of insertion and discarding magic string:
  -- "@" followed by 2 uppercase letters and 2 hex numbers
  -- otherwise the hit is not processed, but copied as-is instead
  -- 2 letters select the insertable item from table supplied by the caller
  -- 2 hex numbers control discarding left and right (0...15 char:s)

  -- empty item is legal and results in discarding if some number is non-ZERO

  -- if uppercasing or other adjustment is needed then the caller must take
  -- care of it in the form of 2 or more separate items provided in the table

  -- insertable items defined:
  -- constant:
  -- * LK lng code (unknown "??" legal but take care elsewhere)
  -- * LN lng name (unknown legal, for example "dana" or "Ido")
  -- * LU lng name uppercased (unknown legal, for example "Dana" or "Ido")
  -- * LO lng name not own (empty or nil if own)
  -- * LV lng name uppercased not own (empty or nil if own)
  -- * LY lng name long (for example "bahasa Swedia")
  -- * LZ lng name long not own (empty or nil if own)
  -- * SC script code (for example "T", "S", "P" for ZH, "C" "L" for SH)
  -- variable (we can have 2 word classes):
  -- * WC word class name (for example "substantivo")
  -- * WU word class name uppercased (for example "Substantivo")
  -- * MT mortyp code (for example "C")
  -- * FR fragment (for example "peN-...-an" or "abelujo")

  -- see "lfinsertultim" and "tablngdbl" use space here and avoid "_"
  -- note the malicious false friendship between EO:frazo kaj ID:frasa

  local contabktaoj = {}
  contabktaoj[3] = 'Vortgrupo -@LK00- enhavanta (@FR00) @SC10'             -- EO only if ("boocatdesir" is true) can be many
  -- contabktaoj[3] = 'Frasa @LZ10 mengandung kata @FR00 @SC10'               -- ID only if ("boocatdesir" is true) can be many
  contabktaoj[4] = 'Frazo -@LK00- enhavanta vorton (@FR00) @SC10'          -- EO only if ("boocatdesir" is true) can be many
  -- contabktaoj[4] = 'Kalimat @LK00 mengandung kata (@FR00) @SC10'           -- ID only if ("boocatdesir" is true) can be many
  contabktaoj[5] = 'Vorto -@LK00- enhavanta morfemon @MT00 (@FR00) @SC10'  -- EO only if ("boocatdesir" is true) can be many
  -- contabktaoj[5] = 'Kata @LK00 mengandung morfem @MT00 (@FR00) @SC10'      -- ID only if ("boocatdesir" is true) can be many

------------------------------------------------------------------------

---- SPECIAL STUFF OUTSIDE MAIN FUNCTION ----

------------------------------------------------------------------------

---- VAR:S ----

local qstrtrace = ""     -- for main & sub:s, debug report sent to caller
local qtabktaoj = {}     -- global for compound categories [0]...[41] and ret

------------------------------------------------------------------------

---- ORDINARY LOCAL DEBUG FUNCTIONS ----

------------------------------------------------------------------------

-- Local function LFTRACEMSG

-- for variables the other sub "lfshowvar" is preferable but in exceptional
-- cases it can be justified to send text containing variables to this sub

-- enhances global "qstrtrace" (may NOT be type "nil")

local function lftracemsg (strbigcrap)
  qstrtrace = qstrtrace .. "<br>" .. strbigcrap .. '.'
end--function lftracemsg

------------------------------------------------------------------------

---- ORDINARY LOCAL MATH FUNCTIONS ----

------------------------------------------------------------------------

local function mathmod (xdividendo, xdivisoro)
  local resultmod = 0 -- MOD operator is "%" and bitwise AND operator lack too
  resultmod = xdividendo % xdivisoro
  return resultmod
end--function mathmod

------------------------------------------------------------------------

---- ORDINARY LOCAL STRING FUNCTIONS ----

------------------------------------------------------------------------

-- test whether char is an ASCII uppercase letter, return bool

local function lftestuc (numkode)
  local booupperc = false
  booupperc = ((numkode>=65) and (numkode<=90))
  return booupperc
end--function lftestuc

------------------------------------------------------------------------

-- test whether char is an ASCII lowercase letter, return bool

local function lftestlc (numcode)
  local boolowerc = false
  boolowerc = ((numcode>=97) and (numcode<=122))
  return boolowerc
end--function lftestlc

------------------------------------------------------------------------

-- Local function LFTESTPUNCTURE

-- test whether char is a punctuation sign, return bool

-- punctuation (5 char:s: ! , . ; ?) 21 33 | 2C 44 | 2E 46 | 3B 59 | 3F 63
-- dash "-" and apo "'" do NOT count as punctuation
-- here we do NOT include SPACE in the list

local function lftestpuncture (numcorde)
  local boopunk = false
  boopunk = ((numcorde==33) or (numcorde==44) or (numcorde==46) or (numcorde==59) or (numcorde==63))
  return boopunk
end--function lftestpuncture

------------------------------------------------------------------------

-- Local function LFADDTHEDASH

local function lfaddthedash (strafikso, booaddleft, booaddright)
  local numdashlength = 0
  local numbuggar = 0
  numdashlength = string.len (strafikso)
  if (numdashlength~=0) then
    numbuggar = string.byte (strafikso,1,1)
    if (numbuggar==45) then
      booaddleft = false -- avoid "--"...
    end--if
    numbuggar = string.byte (strafikso,numdashlength,numdashlength)
    if (numbuggar==45) then
      booaddright = false -- avoid ..."--"
    end--if
    if (booaddleft) then
      strafikso = "-" .. strafikso
    end--if
    if (booaddright) then
      strafikso = strafikso .. "-"
    end--if
  end--if
  return strafikso
end--function lfaddthedash

------------------------------------------------------------------------

-- Local function LFDEBRACKET

-- Separate bracketed part of a string and return the inner or outer
-- part. On failure the string is returned complete and unchanged.

-- Note that for length of hit ZERO ie "()" we have "numbegg" + 1 = "numendd"
-- and for length of hit ONE ie "(x)" we have "numbegg" + 2 = "numendd".

-- "numxminlencz" must be >= 1 !!!

local function lfdebracket (strdeath, boooutside, numxminlencz)

  local numindoux = 1 -- ONE-based
  local numdlong = 0
  local numwesel = 0
  local numbegg = 0 -- ONE-based, ZERO invalid
  local numendd = 0 -- ONE-based, ZERO invalid

  numdlong = string.len (strdeath)
  while (true) do
    if (numindoux>numdlong) then
      break -- ONE-based -- if both "numbegg" "numendd" non-ZERO then maybe
    end--if
    numwesel = string.byte(strdeath,numindoux,numindoux)
    if (numwesel==40) then -- "("
      if (numbegg==0) then
        numbegg = numindoux -- pos of "("
      else
        numbegg = 0
        break -- damn: more then 1 "(" present
      end--if
    end--if
    if (numwesel==41) then -- ")"
      if ((numendd==0) and (numbegg~=0) and ((numbegg+numxminlencz)<numindoux)) then
        numendd = numindoux -- pos of ")"
      else
        numendd = 0
        break -- damn: more then 1 ")" present or ")" precedes "("
      end--if
    end--if
    numindoux = numindoux + 1
  end--while

  if ((numbegg~=0) and (numendd~=0)) then
    if (boooutside) then
      strdeath = string.sub(strdeath,1,(numbegg-1)) .. string.sub(strdeath,(numendd+1),numdlong)
    else
      strdeath = string.sub(strdeath,(numbegg+1),(numendd-1)) -- separate substring
    end--if
  end--if

  return strdeath -- same string variable

end--function lfdebracket

------------------------------------------------------------------------

-- Local fuction LFREMOVE2BRA

local function lfremove2bra (strinmedparenteser)
  local stroututanparenteser = ''
  local numindozux = 1 -- ONE-based
  local numparepanjang = 0
  local numparechar = 0
  numparepanjang = string.len (strinmedparenteser)
  while (true) do
    if (numindozux>numparepanjang) then
      break
    end--if
    numparechar = string.byte(strinmedparenteser,numindozux,numindozux)
    if ((numparechar~=40) and (numparechar~=41)) then
      stroututanparenteser = stroututanparenteser .. string.char(numparechar)
    end--if
    numindozux = numindozux + 1
  end--while
  return stroututanparenteser
end--function lfremove2bra

------------------------------------------------------------------------

---- ORDINARY LOCAL CONVERSION FUNCTIONS ----

------------------------------------------------------------------------

-- Local function LFONEHEXTOINT

-- Convert 1 ASCII code of a hex digit to an UINT4 ie 0...15 (255 invalid).

-- Only uppercase accepted

local function lfonehextoint (numdigit)
  local numresult = 255
  if ((numdigit>47) and (numdigit<58)) then
    numresult = numdigit-48
  end--if
  if ((numdigit>64) and (numdigit<71)) then
    numresult = numdigit-55
  end--if
  return numresult
end--function lfonehextoint

------------------------------------------------------------------------

---- ORDINARY LOCAL UTF8 FUNCTIONS ----

------------------------------------------------------------------------

-- Local function LFUTF8LENGTH

-- Measure length of a single UTF8 char, return ZERO if invalid.

-- Does NOT thoroughly check the validity, looks at 1 octet only

-- Input  : - numbgoctet (beginning octet of a UTF8 char)

-- Output : - numlen1234x (1...4 or ZERO if invalid)

local function lfutf8length (numbgoctet)
  local numlen1234x = 0
    if (numbgoctet<128) then
      numlen1234x = 1 -- $00...$7F -- ANSI/ASCII
    end--if
    if ((numbgoctet>=194) and (numbgoctet<=223)) then
      numlen1234x = 2 -- $C2 to $DF
    end--if
    if ((numbgoctet>=224) and (numbgoctet<=239)) then
      numlen1234x = 3 -- $E0 to $EF
    end--if
    if ((numbgoctet>=240) and (numbgoctet<=244)) then
      numlen1234x = 4 -- $F0 to $F4
    end--if
  return numlen1234x
end--function lfutf8length

------------------------------------------------------------------------

-- Local function LFCASEGENE

-- Adjust case of a single letter (generous), limited unicode support
-- with some common UTF8 ranges.

-- Input  : * strucinut : single unicode letter (1 or 2 octet:s)
--          * booucas   : for desired uppercase "true" and for
--                        lowercase "false"

-- Output : * strucinut : (same var, unchanged if input is
--                         empty or unknown or invalid)

-- * in ASCII lowercase is $20 above uppercase, b5 reveals
--   the case (1 is upper)
-- * the same is valid in $C3-block
-- * this is NOT valid in $C4-$C5-block, lowercase is usually 1 above
--   uppercase and nothing reveals the case reliably
-- * case delta can be 1 or $20 or $50 other
-- * lowercase is usually above uppercase but not always
-- * case pair distance can span $40-boundary or even $0100-boundary

-- $C2-block $0080 $C2,$80 ... $00BF $C2,$BF no letters (OTOH NBSP mm)

-- $C3-block $00C0 $C3,$80 ... $00FF $C3,$BF (SV mm) delta $20 UC-LC-UC-LC
-- upper $00C0 $C3,$80 ... $00DF $C3,$9F
-- lower $00E0 $C3,$A0 ... $00FF $C3,$BF
-- AA AE EE NN OE UE mm
-- $D7 $DF $F7 excluded (not letters)
-- $FF excluded (here LC, UC is $0178)

-- $C4-$C5-block $0100 $C4,$80 ... $017F $C5,$BF (EO mm)
-- delta 1 and UC even but messy with many exceptions
-- EO $0108 ... $016D case delta 1
-- for example SX upper $015C $C5,$9C - lower $015D $C5,$9D
-- $0138 $0149 $017F excluded (not letters)
-- $0178 excluded (here UC, LC is $FF)
-- $0100 ... $0137 UC even
-- $0139 ... $0148 reversed (UC odd) note that case delta is NOT reversed
-- $014A ... $0177 UC even again
-- $0179 ... $017E reversed (UC odd) note that case delta is NOT reversed

-- $CC-$CF-block $0300 $CC,$80 ... $03FF $CF,$BF (EL mm) delta $20
-- EL $0370 ... $03FF (officially)
-- strict EL base range $0391 ... $03C9 case delta $20
-- $0391 $CE,$91 ... $03AB $CE,$AB upper
-- $03B1 $CE,$B1 ... $03CB $CD,$8B lower
-- for example "omega" upper $03A9 $CE,$A9 - lower $03C9 $CF,$89

-- $D0-$D3-block $0400 $D0,$80 ... $04FF $D3,$BF (RU mm) delta $20 $50
-- strict RU base range $0410 ... $044F case delta $20 but 1 extra char !!!
-- $0410 $D0,$90 ... $042F $D0,$AF upper
-- $0430 $D0,$B0 ... $044F $D1,$8F lower
-- for example "CCCP-gamma" upper $0413 $D0,$93 - lower $0433 $D0,$B3
-- extra base char and exception is special "E" with horizontal doubledot
--       case delta $50 (upper $0401 $D0,$81 - lower $0451 $D1,$91)
-- same applies for ranges $0400 $D0,$80 ... $040F $D0,$8F upper
--      and $0450 $D1,$90 ... $045F $D1,$9F lower

-- This sub depends on "MATH FUNCTIONS"\"mathmod" and
-- "MATH FUNCTIONS"\"mathbittest" and "STRING FUNCTIONS"\"lftestuc" and
-- "STRING FUNCTIONS"\"lftestlc" and "UTF8 FUNCTIONS"\"lfutf8length".

local function lfcasegene (strucinut, booucas)

  local numlaengden = 0 -- length from "string.len"
  local numchaer = 0 -- UINT8 beginning char
  local numchaes = 0 -- UINT8 later char (BIG ENDIAN, lower value here)
  local numcharel = 0 -- UINT8 code relative to beginning of block $00...$FF
  local numdelabs = 0 -- UINT8 absolute positive delta
  local numdelta = 0 -- SINT16 signed, can be negative
  local numdelcarry = 0 -- SINT8 signed, can be negative

  local boowantlower = false
  local booisuppr = false
  local booislowr = false
  local boopending = false

  local booc3blok = false -- $C3 only $00C0...$00FF SV mm delta 32
  local booc4c5bl = false -- $C4 $C5  $0100...$017F EO mm delta 1
  local boocccfbl = false -- $CC $CF  $0300...$03FF EL mm delta 32
  local bood0d3bl = false -- $D0 $D3  $0400...$04FF RU mm delta 32 80

  while (true) do -- fake loop

    numlaengden = string.len (strucinut)
    if ((numlaengden==0) or (numlaengden>2)) then
      break -- to join mark
    end--if
    numchaer = string.byte (strucinut,1,1)
    if ((lfutf8length(numchaer))~=numlaengden) then
      break -- to join mark -- mismatch with length from sub "lfutf8length"
    end--if
    boowantlower = (not booucas)

    if (numlaengden==1) then
      booisuppr = lftestuc(numchaer)
      booislowr = lftestlc(numchaer)
      if (booisuppr and boowantlower) then
        numdelta = 32 -- ASCII UPPER->lower
      end--if
      if (booislowr and booucas) then
        numdelta = -32 -- ASCII lower->UPPER
      end--if
      break -- to join mark
    end--if

    numchaes = string.byte (strucinut,2,2)
    booc3blok = (numchaer==195) -- case delta is 32
    booc4c5bl = ((numchaer==196) or (numchaer==197)) -- case delta is 1
    boocccfbl = ((numchaer>=204) and (numchaer<=207)) -- case delta is 32
    bood0d3bl = ((numchaer>=208) and (numchaer<=211)) -- case delta is 32 80

    if (booc3blok) then
      boopending = true
      numcharel = numchaes + 64 -- simplified calculation here (begins at $C0)
      if ((numcharel==215) or (numcharel==223) or (numcharel==247)) then
        boopending = false -- not a letter, we are done
      end--if
      if (numcharel==255) then
        boopending = false -- special LC silly "Y" with horizontal doubledot
        if (booucas) then
          numdelta = 121 -- lower->UPPER (distant and reversed)
        end--if
      end--if
      if (boopending) then
        booislowr = (mathbittest(numcharel,5)) -- mostly regular block
        booisuppr = not booislowr
        if (booisuppr and boowantlower) then
          numdelta = 32 -- UPPER->lower
        end--if
        if (booislowr and booucas) then
          numdelta = -32 -- lower->UPPER
        end--if
      end--if (boopending) then
      break -- to join mark
    end--if

    if (booc4c5bl) then
      boopending = true
      numcharel = (numchaer-196)*64 + (numchaes-128) -- begins at $C4
      if ((numcharel==56) or (numcharel==73) or (numcharel==127)) then
        boopending = false -- not a letter, we are done
      end--if
      if (numcharel==120) then
        boopending = false -- special UC silly "Y" with horizontal doubledot
        if (boowantlower) then
          numdelta = -121 -- UPPER->lower (distant and reversed)
        end--if
      end--if
      if (boopending) then
        if (((numcharel>=57) and (numcharel<=73)) or (numcharel>=121)) then
          booislowr = ((mathmod(numcharel,2))==0) -- UC odd (reversed)
        else
          booislowr = ((mathmod(numcharel,2))==1) -- UC even (ordinary)
        end--if
        booisuppr = not booislowr
        if (booisuppr and boowantlower) then
          numdelta = 1 -- UPPER->lower
        end--if
        if (booislowr and booucas) then
          numdelta = -1 -- lower->UPPER
        end--if
      end--if (boopending) then
      break -- to join mark
    end--if

    if (boocccfbl) then
      numcharel = (numchaer-204)*64 + (numchaes-128) -- begins at $CC
      booisuppr = ((numcharel>=145) and (numcharel<=171))
      booislowr = ((numcharel>=177) and (numcharel<=203))
      if (booisuppr and boowantlower) then
        numdelta = 32 -- UPPER->lower
      end--if
      if (booislowr and booucas) then
        numdelta = -32 -- lower->UPPER
      end--if
      break -- to join mark
    end--if

    if (bood0d3bl) then
      numcharel = (numchaer-208)*64 + (numchaes-128) -- begins at $D0
      booisuppr = (numcharel<=47) -- delta $20 $50
      booislowr = ((numcharel>=48) and (numcharel<=95)) -- delta $20 $50
      if (booisuppr or booislowr) then
        numdelabs = 32
        if ((numcharel<=15) or (numcharel>=80)) then
          numdelabs = 80
        end--if
      end--if
      if (booisuppr and boowantlower) then
        numdelta = numdelabs -- UPPER->lower
      end--if
      if (booislowr and booucas) then
        numdelta = -numdelabs -- lower->UPPER
      end--if
      break -- to join mark
    end--if

    break -- finally to join mark
  end--while -- fake loop -- join mark

  if ((numlaengden==1) and (numdelta~=0)) then
    strucinut = string.char (numchaer + numdelta) -- no risk of carry here
  end--if
  if ((numlaengden==2) and (numdelta~=0)) then
    numdelcarry = 0
    while ((numchaes+numdelta)>=192) do
       numdelta = numdelta - 64
       numdelcarry = numdelcarry + 1 -- add BIG ENDIAN 6 bits with carry
    end--while
    while ((numchaes+numdelta)<=127) do
       numdelta = numdelta + 64
       numdelcarry = numdelcarry - 1 -- negat add BIG ENDIAN 6 bits with carry
    end--while
    strucinut = string.char (numchaer + numdelcarry) .. string.char (numchaes + numdelta)
  end--if

  return strucinut -- same var for input and output !!!

end--function lfcasegene

------------------------------------------------------------------------

-- Local function LFXCASEULT

-- Adjust letter case of beginning letter or all letters in a word or group of
-- words to upper or lower, limited unicode support (generous LFCASEGENE).

-- See LFFIXCASE for ASCII-only version.

-- Input  : * strenigo : word or group of words (may be empty)
--          * booupcas : "true" for uppercase and "false" for lowercase
--          * boodoall : "true" to adjust all letters, "false" only beginning

-- This sub depends on "MATH FUNCTIONS"\"mathmod" and
-- "MATH FUNCTIONS"\"mathbittest" and "STRING FUNCTIONS"\"lftestuc" and
-- "STRING FUNCTIONS"\"lftestlc" and "UTF8 FUNCTIONS"\"lfutf8length" and
-- "UTF8 FUNCTIONS"\"lfcasegene" (generous LFCASEGENE).

local function lfxcaseult (strenigo, booupcas, boodoall)

  local numlein = 0
  local numposi = 1 -- octet position ONE-based
  local numcut = 0 -- length of an UTF8 char
  local bootryadj = false -- try to adjust single char
  local strte7mp = ""
  local strelygo = ""

  numlein = string.len (strenigo)
  while (true) do
    if (numposi>numlein) then
      break -- done
    end--if
    bootryadj = (boodoall or (numposi==1))
    numcut = lfutf8length(string.byte(strenigo,numposi,numposi))
    if ((numcut==0) or ((numposi+numcut-1)>numlein)) then
      numcut = 1 -- skip ie copy one faulty octet
      bootryadj = false
    end--if
    strte7mp = string.sub (strenigo,numposi,(numposi+numcut-1)) -- 1...4 oct
    if (bootryadj) then
      strte7mp = lfcasegene(strte7mp,booupcas) -- (generous LFCASEGENE)
    end--if
    strelygo = strelygo .. strte7mp -- this can be slow
    numposi = numposi + numcut
  end--while
  return strelygo

end--function lfxcaseult

------------------------------------------------------------------------

---- ORDINARY LOCAL HIGH LEVEL FUNCTIONS ----

------------------------------------------------------------------------

-- Local function LFINSERTULTIM

-- Insert selected extra strings into a given string at given positions
-- with optional discarding if the insertable item is empty. Discarding is
-- protected from access out of range by clamping.

-- Input  : * strmdata -- main data string with control cod (syntax see below)
--          * tabinseert -- not-string is safe and has same effect as empty
--                          string, "nil" or empty string "" are preferred
-- Output : * strhazil

-- syntax of insertion and discarding magic string:
-- "@" followed by 2 uppercase letters and 2 hex numbers
-- otherwise the hit is not processed, but copied as-is instead
-- 2 letters select the insertable item from table supplied by the caller
-- 2 hex numbers control discarding left and right (0...15 char:s)

-- empty item is legal and results in discarding if some number is non-ZERO

-- if uppercasing or other adjustment is needed then the caller must take
-- care of it in the form of 2 or more separate items provided in the table

-- This sub depends on "STRING FUNCTIONS"\"lftestuc"
-- and "CONVERSION FUNCTIONS"\"lfonehextoint".

local function lfinsertultim (strmdata,tabinseert)

  local varduahuruf = 0
  local strhazil = ''
  local numdatalen = 0
  local numdatainx = 0
  local numdataoct = 0 -- maybe @
  local numdataodt = 0 -- UC
  local numdataoet = 0 -- UC
  local numammlef = 0 -- hex and discard left
  local numammrig = 0 -- hex and discard right
  local boogotmagic = false

  numdatalen = string.len(strmdata)
  numdatainx = 1 -- ONE-based

  while (true) do -- genuine loop, "numdatainx" is the counter
    if (numdatainx>numdatalen) then -- beware of risk of overflow below
      break -- done (ZERO iterations possible)
    end--if
    boogotmagic = false
    numdataoct = string.byte(strmdata,numdatainx,numdatainx)
    numdatainx = numdatainx + 1
    while (true) do -- fake loop
      if ((numdataoct~=64) or ((numdatainx+3)>numdatalen)) then
        break -- no hit here
      end--if
      numdataodt = string.byte(strmdata, numdatainx   , numdatainx   )
      numdataoet = string.byte(strmdata,(numdatainx+1),(numdatainx+1))
      if ((lftestuc(numdataodt)==false) or (lftestuc(numdataoet)==false)) then
        break -- no hit here
      end--if
      numammlef = string.byte(strmdata,(numdatainx+2),(numdatainx+2))
      numammrig = string.byte(strmdata,(numdatainx+3),(numdatainx+3))
      numammlef = lfonehextoint (numammlef)
      numammrig = lfonehextoint (numammrig)
      boogotmagic = ((numammlef~=255) and (numammrig~=255))
      break
    end--while -- fake loop
    if (boogotmagic) then
      numdatainx = numdatainx + 4 -- consumed 5 char:s, cannot overflow here
      varduahuruf = string.char (numdataodt,numdataoet)
      varduahuruf = tabinseert[varduahuruf] -- risk of type "nil"
      if (type(varduahuruf)~="string") then
        varduahuruf = '' -- type "nil" or invalid type gives empty string
      end--if
      if (varduahuruf=='') then
        numdataoct = string.len(strhazil) - numammlef -- this can underflow
        if (numdataoct<=0) then
          strhazil = ''
        else
          strhazil = string.sub(strhazil,1,numdataoct) -- discard left
        end--if
        numdatainx = numdatainx + numammrig -- discard right this can overflow
      else
        strhazil = strhazil .. varduahuruf -- augment
      end--if
    else
      strhazil = strhazil .. string.char(numdataoct) -- copy char as-is
    end--if (boogotmagic) else
  end--while

  return strhazil

end--function lfinsertultim

------------------------------------------------------------------------

-- Local function LFFINDITEMS

-- Input  : * long string where to search
--          * even number of char:s fe "WCWU" what to search
-- Output : * bool

local function lffinditems (strwhere, strandevenwhat)

  local strcxztvaa = ''
  local numcxzlen = 0
  local numcxzind = 1 -- ONE-based step TWO
  local boofoundthecrap = false

  numcxzlen = string.len(strandevenwhat)
  while (true) do
    if ((numcxzind+1)>numcxzlen) then
      break -- not found
    end--if
    strcxztvaa = "@" .. string.sub(strandevenwhat,numcxzind,(numcxzind+1))
    boofoundthecrap = (string.find(strwhere,strcxztvaa,1,true)~=nil)
    if (boofoundthecrap) then
      break -- found
    end--if
    numcxzind = numcxzind + 2
  end--while
  return boofoundthecrap

end--function lffinditems

------------------------------------------------------------------------

-- Local function LFLEFTRIGHT

local function lfleftright (strbigleft, strbigright)
  local strwikilink = ''
  if (strbigleft==strbigright) then
    strwikilink = strbigleft -- save bloat
  else
    strwikilink = strbigleft .. '|' .. strbigright -- here genuine wall needed
  end--if
  strwikilink = '[[' .. strwikilink .. ']]' -- always link
  return strwikilink
end--function lfleftright

------------------------------------------------------------------------

-- Local function LFFILLKATON

-- Add one string and maybe one bool to global "qtabktaoj" provided the
-- string is nonempty and not yet in and there is some space left.

-- This function has exclusive write access to "qtabktaoj". Do NOT write
-- to it in any other way except during early initialization.

-- We allow max 16 cat:s from auto split or split control parameter and
-- max 4 cat:s from extra parameter but there is a sum limit of 18.

local function lffillkaton (stritem, boomain)
  local numsrchindex = 0
  local varpeek = 0
  while (true) do
    if (numsrchindex==18) then
      break -- no free slot left
    end--if
    varpeek = qtabktaoj[numsrchindex]
    if (varpeek==stritem) then
      numsrchindex = 18
      break -- already in
    end--if
    if (varpeek==nil) then
      break -- found free slot
    end--if
    numsrchindex = numsrchindex + 1
  end--while
  if (numsrchindex~=18) then
    qtabktaoj[numsrchindex] = stritem
    if (boomain) then
      qtabktaoj[numsrchindex+20] = true
    end--if
  end--if
end--function lffillkaton

------------------------------------------------------------------------

-- Local function LFGET345NONIL

-- we read from glocal "contabktaoj" index 3...5

-- "nummortyyp" mortyp "W" has code 87 and gives index 3 or 4
-- "nummortyyp" mortyp other has code < 87 (ZERO is safe) and gives index 5
-- "boofraazo" can be assigned to "false" if not needed (index 5)

local function lfget345nonil (nummortyyp, boofraazo)

  local strctlstring = ''
  local numpiinx = 0 -- temp 3...5

  if (nummortyyp==87) then
    numpiinx = 3 -- vortgrupo contains "W"
    if (boofraazo) then
      numpiinx = 4 -- kalimat contains "W"
    end--if
  else
    numpiinx = 5 -- word can contain C I M N P U but obviously not "W"
  end--if
  strctlstring = contabktaoj[numpiinx] -- pick main data string risk for "nil"
  if (type(strctlstring)~="string") then
    strctlstring = ''
  end--if

  return strctlstring -- can be empty but NOT type "nil"

end--function lfget345nonil

------------------------------------------------------------------------

-- Local function LFSPLITAA

-- Perform the automatic multiword split or assisted split controlled
-- by 2 prevalidated tables.

-- Note that the split can sort of fail and return same string, most notably
-- if no split boundaries exist, or some do exist but all are blocked.

-- Counting of the boundaries is tricky. We DO count the suppressed ones but
-- do NOT count multiple consecutive non-letters more than once. Thus the
-- boundaries are between words only and at begin and end, there CANNOT
-- be empty content between 2 boundaries. We usually have 2 faked empty
-- boundaries at begin and end, but they can also be real and count then.

-- For example "AND YES, we !,definit-ely,! can." contains 5 words (that can
-- become 5 output fragments numbered 0...4) words and 5 input boundaries
-- (numbered 0...4). In the text "?va?" there are 2 boundaries at begin
-- and end.

-- We need sub "lfinsertultim" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "boomorfium" must be
-- false unless lng in "tabkoudo" is valid and known.

-- Names of the categories are built from "contabktaoj" index 3 (vortgrupo)
-- or 4 (frazo) but here not 5 (vorto, useful for manual split). Categories
-- are brewed only if "boomorfium" is true, the split does not fail, and the
-- individual fragment is not blocked. For example "va" will neither link nor
-- categorize but "va?" will do both. The "#"..."N"-syntax blocks both linking
-- and morpheme categorization (if the latter is enabled otherwise). Even if
-- linking is blocked for other reason (most notably only 1 fragment generated
-- after split attempt) then categorization is suppressed as well.

-- Input  : * "strlemmain"   -- input text (pagename)
--          * "tabblokr"     -- index 0...15 holes permitted, from "%"
--          * "tablinker"    -- index 0...15 holes permitted, from "#"
--          * "boomorfium"   -- "true" if compound cat:s are desired
--          * "bookalimat"   -- "true" is word class "KA" was specified
--          * "tabkoudo"     -- lng stuff ("??" legal but needs "boomorfium")
-- Output : * "stromong"     -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s (index [20]...[35] main page status not used here).

-- This sub depends on "UTF8 FUNCTIONS"\"lfxcaseult" (generous) and
-- "HIGH LEVEL FUNCTIONS"\"lfinsertultim" and
-- "HIGH LEVEL FUNCTIONS"\"lffillkaton" and
-- "HIGH LEVEL FUNCTIONS"\"lfget345nonil" and
-- "HIGH LEVEL FUNCTIONS"\"lfleftright".

local function lfsplitaa (strlemmain, tabblokr, tablinker, boomorfium, bookalimat, tabkoudo)

  local varrisktabl = 0 -- can be type "nil"
  local strfragment = ''
  local strfragdext = '' -- right part with visible text (wall not included)
  local stromong = '' -- final result
  local strkattcty = ''
  local strkatoon = '' -- for "lffillkaton"
  local numloonginp = 0 -- length of input
  local numinxed = 0 -- ZERO-based index of input char:s
  local numboundrinp = 0 -- counter of detected boundaries include suppressed
  local numoutfrag = 0 -- counter of produced fragments
  local numotcot = 0
  local numotcet = 0
  local numotcuu = 0 -- control code from "tablinker" (ZERO is "nil" ie none)
  local boohavechar = false
  local booqboueof = false -- combo status: boundary char or end of string
  local booprevqbe = false -- previous combo status
  local boosuppress = false -- suppress split but still do count the boundary
  local boodolnkkat = false -- do link and maybe categorize the fragment

  numloonginp = string.len(strlemmain)

  while (true) do
    if (numinxed==numloonginp) then
      boohavechar = false
      booqboueof = true -- copied whole string and end of fragment
      boosuppress = false -- last chance, we must output accumulated fragment
    else
      boohavechar = true -- can be part of word or boundary !!!
      numotcot = string.byte (strlemmain,(numinxed+1),(numinxed+1))
      numinxed = numinxed + 1 -- ZERO-based
      booqboueof = ((numotcot==32) or lftestpuncture(numotcot))
      boosuppress = (tabblokr[numboundrinp]=="1")
    end--if
    if (booprevqbe and (booqboueof==false)) then
      numboundrinp = numboundrinp + 1 -- count even suppressed boundaries
    end--if
    booprevqbe = booqboueof -- assign previous status for next round
    if (booqboueof and (boosuppress==false) and (strfragment~='')) then
      strfragdext = strfragment -- visible text right of the wall "|"
      boodolnkkat = false -- preassume no link no cat
      if ((stromong~='') or boohavechar) then -- avoid selflink to page
        varrisktabl = tablinker[numoutfrag] -- can be type "nil"
        numotcuu = 0
        if (type(varrisktabl)=="string") then
          numotcuu = string.byte (varrisktabl,1,1)
        end--if
        if (numotcuu==73) then -- "I" lowercase
          strfragment = lfxcaseult (strfragment,false,false)
        end--if
        if (numotcuu==65) then -- "A" uppercase
          strfragment = lfxcaseult (strfragment,true,false)
        end--if
        if (numotcuu==58) then -- ":" explicit replace
          strfragment = string.sub (varrisktabl,2,string.len(varrisktabl))
        end--if
        boodolnkkat = (numotcuu~=78) -- "boodolnkkat" needed below 2 times
      end--if ((stromong~='') or boohavechar) then
      if (boodolnkkat) then
        stromong = stromong .. lfleftright (strfragment,strfragdext) -- wlink
      else
        stromong = stromong .. strfragment -- add raw fragment no link
      end--if
      if (boomorfium and boodolnkkat) then
        strkattcty = lfget345nonil (87,bookalimat) -- always "W" thus 5 imposs
        numotcet = string.len(strkattcty) -- this is automatic or assisted
        if (numotcet>=2) then
          tabkoudo["WC"] = nil -- no stupid word class here
          tabkoudo["WU"] = nil -- no stupid word class here
          tabkoudo["MT"] = nil -- a word does not have any morpheme type
          tabkoudo["FR"] = strfragment
          strkatoon = lfinsertultim (strkattcty,tabkoudo)
          lffillkaton (strkatoon,false) -- NOT main page -- "qtabktaoj"
        end--if (numotcet>=2) then
      end--if (boomorfium and boodolnkkat) then
      strfragment = ''
      numoutfrag = numoutfrag + 1 -- count fragments "lffillkaton" separately
    end--if (booqboueof and (boosuppress==false) and (strfragment~='')) then
    if (boohavechar) then
      if (booqboueof and (boosuppress==false)) then
        stromong = stromong .. string.char(numotcot) -- add non-linkable char
      else
        strfragment = strfragment .. string.char(numotcot) -- add chr to fragm
      end--if
    else
      break -- done all
    end--if
  end--while

  return stromong

end--function lfsplitaa

------------------------------------------------------------------------

-- Local function LFSPLITMN

-- Perform the manual split controlled by one prevalidated table. Actually
-- the table contains the presplit complete lemma and the pagename is not
-- needed at all. Max 16 fragments can come in, type "F000" does count. We
-- rely on all details being prevalidated (number of fragments, plusses and
-- rectangular brackets, colons and slashes, only valid uppercase letters
-- before colon, legal use of "L:", ...).

-- We need sub "lfinsertultim" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "boomorkat" must be
-- false unless lng in "tabkuodo" is valid and known.

-- Names of the categories are built from "contabktaoj" index 3 (vortgrupo)
-- or 4 (frazo) or 5 (vorto).

-- The source string uses slashes "/" as field separator but the destination
-- string uses walls "|".

-- Omitting deleted characters and dash adding are performed only for
-- fragment type "F210" ie only one field after ":" and no slash "/".
-- Also "L" is permitted for fragment type "F210" only but this is
-- prevalidated. Note that in the early prevalidation step the debracketing
-- for the "sum check" is NOT limited to fragment type "F210".

-- We have to maintain 2 separate fragment counters. For example valid syntax
-- "[M:kung]+a+[M:doeme]" gives 3 input fragments in "tabmnfragoj", but only
-- 2 output fragments in "qtabktaoj", and we want them to have indexes 0
-- and 1, not 0 and 2. The out counter is not explicit, it is the content
-- of "qtabktaoj" processed in "lffillkaton".

-- There is a problem with the wikisyntax, for example "[[no]]pe" will act as
-- "[[no|nope]]" ie the visible link text will continue beyond the bracket
-- and cover the "pe", whereas "[[no]]??" does not trigger such behavior. To
-- prevent this from happening we must add something invisible, and we use
-- "<i></i>".

-- here we DO introduce wikilinks with double brackets and walls
-- here we DO expand "+" to " + " (between fragments)
-- here we DO add dashes to some affixes (fragment type "F210")
-- here we do NOT carry out the "sum check" (done in the prevalidation)

-- Input  : * "tabmnfragoj"    -- prevalidated presplit table "+[I:bug/BUG]"
--          * "boomorkat"      -- "true" if compound cat:s are desired
--          * "bookalymat"     -- "true" is word class "KA" was specified
--          * "tabkuodo"       -- lng stuff ("??" legal but needs "boomorkat")
-- Output : * "strumung"       -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s (index [20]...[35] main page status not used here).

-- This sub depends on "STRING FUNCTIONS"\"lftestuc" and
-- "STRING FUNCTIONS"\"lfdebracket" and "STRING FUNCTIONS"\"lfremove2bra" and
-- "STRING FUNCTIONS"\"lfaddthedash" and
-- "HIGH LEVEL FUNCTIONS"\"lfinsertultim" and
-- "HIGH LEVEL FUNCTIONS"\"lffinditems" and
-- "HIGH LEVEL FUNCTIONS"\"lffillkaton" and
-- "HIGH LEVEL FUNCTIONS"\"lfget345nonil" and
-- "HIGH LEVEL FUNCTIONS"\"lfleftright".

local function lfsplitmn (tabmnfragoj, boomorkat, bookalymat, tabkuodo)

  local varrysktabl = 0 -- from in table can be type "nil"
  local strumung = '' -- final result with links
  local strwalzleft = ''
  local strwallrght = ''
  local strwallcatg = '' -- same as "strwalzleft" unless "L"-trick is used
  local strkattctx = ''
  local strkatton = '' -- for "lffillkaton"
  local numinnfrog = 0 -- counter in "tabmnfragoj" type "F000" does count
  local numlenfrago = 0 -- ONE-based last valid index
  local numivnxed = 0 -- ONE-based index of char:s inside fragment
  local numcuaar = 0
  local numcuabr = 0 -- +1
  local numcuacr = 0 -- +2
  local numcom1of79z = 0 -- 0 | 67 C 73 I 76 L 77 M 78 N 80 P 85 U | 87 W
  local booeldtrick = false -- true for the "L"-trick giving type "N"
  local booright = false -- false left | true right
  local boohavecolon = false
  local boo210magic = false -- enhance and strip then
  local booneedmor = false

  while (true) do -- outer loop counts fragments in table
    booeldtrick = false -- separate verdict for every fragment
    boohavecolon = false -- separate verdict for every fragment
    boo210magic = false -- separate verdict for every fragment
    numcom1of79z = 0 -- default none, separate verdict for every fragment
    varrysktabl = tabmnfragoj [numinnfrog] -- can be type "nil" !!!
    numinnfrog = numinnfrog + 1
    if (type(varrysktabl)~="string") then
      break -- give up on "nil"
    end--if
    numlenfrago = string.len (varrysktabl) -- cannot be empty
    numivnxed = 1 -- ONE-based
    numcuaar = string.byte (varrysktabl,1,1)
    if (numcuaar==43) then
      numivnxed = 2 -- ONE-based skip the "+" even for type "F000" far below
      strumung = strumung .. ' + ' -- add the spaces here
      numcuaar = string.byte (varrysktabl,2,2) -- pick new char cannot be "+"
    end--if
    if (numcuaar==91) then -- bracketed []-fragment processed char-by-char
      numivnxed = numivnxed + 1 -- now at least 2
      strwalzleft = ''
      strwallrght = ''
      booright = false
      numcuabr = 0
      numcuacr = 0 -- minimal fe "[M:x]" 5 char:s 1...5 or 2...6
        if ((numlenfrago-numivnxed)>=3) then
          numcuabr = string.byte (varrysktabl,numivnxed,numivnxed)
          numcuacr = string.byte (varrysktabl,(numivnxed+1),(numivnxed+1))
        end--if
        if ((numcuacr==58) and lftestuc(numcuabr)) then
          numcom1of79z = numcuabr -- "numcuabr" is prevalidated ;-)
          numivnxed = numivnxed + 2 -- eat it away too
          boohavecolon = true -- fragment type "F210" or "F211"
          if (numcom1of79z==76) then
            booeldtrick = true -- fe "fer(o)" -> link "fero" and categ "fer"
            numcom1of79z = 78 -- "L" -> "N"
          end--if
        end--if
        while (true) do -- inner loop counts char:s in a bracketed []-fragment
          if (numivnxed==numlenfrago) then
            break -- skip trailing ']' guaranteed to exist
          end--if
          numcuaar = string.byte (varrysktabl,numivnxed,numivnxed)
          if (booright) then
            strwallrght = strwallrght .. string.char (numcuaar) -- wall NOT po
          else
            if (numcuaar==47) then
              booright = true -- source separating slash "/"
            else
              strwalzleft = strwalzleft .. string.char (numcuaar)
            end--if
          end--if
          numivnxed = numivnxed + 1
        end--while
        if (strwallrght=='') then
          strwallrght = strwalzleft -- type "F200" or "F210"
          boo210magic = boohavecolon -- magic qualifies only if type is F210
        end--if
        if (boo210magic) then -- try enhance left fe "il" -> "-il-"
          if (numcom1of79z==80) then
            strwalzleft = lfaddthedash (strwalzleft,false,true) -- P
          end--if
          if (numcom1of79z==85) then
            strwalzleft = lfaddthedash (strwalzleft,true,false) -- U
          end--if
          if (numcom1of79z==73) then
            strwalzleft = lfaddthedash (strwalzleft,true,true) -- I
          end--if
        end--if
        strwallcatg = strwalzleft -- seize after enhancing before stripping
        if (boo210magic) then -- always strip but in various ways
          strwalzleft = lfremove2bra (strwalzleft) -- link "kac(o)" -> "kaco"
          if (booeldtrick) then -- "L" -> "N"
            strwallcatg = lfdebracket (strwallcatg,true,1) -- for the category
          else
            strwallcatg = lfremove2bra (strwallcatg) -- for the category
          end--if
        end--if
      strumung = strumung .. lfleftright (strwalzleft,strwallrght) .. '<i></i>' -- always link
      if (boomorkat and (numcom1of79z~=0)) then
        strkattctx = lfget345nonil (numcom1of79z,bookalymat) -- 3 or 4 or 5
        numcuaar = string.len(strkattctx) -- this is the manual split
        if (numcuaar>=2) then
          booneedmor = lffinditems(strkattctx,"MT") -- need it ??
          tabkuodo["WC"] = nil -- no stupid word class here
          tabkuodo["WU"] = nil -- no stupid word class here
          if (booneedmor) then
            tabkuodo["MT"] = string.char(numcom1of79z) -- morpheme type
          else
            tabkuodo["MT"] = nil -- no morpheme type here
          end--if
          tabkuodo["FR"] = strwallcatg -- fragment or word
          strkatton = lfinsertultim (strkattctx,tabkuodo)
          lffillkaton (strkatton,false) -- NOT main page -- "qtabktaoj"
        end--if (numcuaar>=2) then
      end--if (boomorkat and (numcom1of79z~=0)) then
    else
      strumung = strumung .. string.sub (varrysktabl,numivnxed,numlenfrago) -- copy type F000 as-is
    end--if (numcuaar==91) else
  end--while

  return strumung

end--function lfsplitmn

------------------------------------------------------------------------

-- Local function LFSPLITSI

-- Perform the simple root split (3, "$S") or simple bare
-- root (4, "$B") strategy. Pagename is needed.

-- $S simple root split    suno  -> sun        + [-o/o] kat "N!sun" + "U:-o"
--                         Suno  -> [suno/Sun] + [-o/o] kat "N!sun" + "U:-o"
-- $B simple bare root     sun   -> sun                 kat "M!sun"
--                         Sun   -> [sun/Sun]           kat "M!sun"
-- $B simple bare root NR  #     -> #                   kat "N!#"
--                                ("#" represents a Chinese letter)

-- Note that for $S the mortyp is always "N" (nonstandalone) whereas
-- for $B it can be either "M" (standalone) or "N".

-- We need sub "lfinsertultim" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "bookomdez" must be
-- false unless lng in "tablngbah" is valid and known.

-- Names of the categories are built from "contabktaoj" index 5 (vorto).

-- Input  : * "strhalaman"     -- input lemma ie pagename
--          * "numkodsplit"    -- 3 or 4 for $S or $B
--          * "bookomdez"      -- "true" if compound cat:s are desired at all
--          * "boonitro"       -- "true" if word class is NR
--          * "tablngbah"      -- lng stuff ("??" legal but needs "bookomdez")
-- Output : * "strymyng"       -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s and maybe index [20]...[35] with main page status.
-- In fact only one index (probably [20]) can receive the "true" here.

-- This sub depends on "UTF8 FUNCTIONS"\"lfxcaseult" (generous) and
-- "HIGH LEVEL FUNCTIONS"\"lfinsertultim" and
-- "HIGH LEVEL FUNCTIONS"\"lffillkaton" and
-- "HIGH LEVEL FUNCTIONS"\"lfget345nonil".

local function lfsplitsi (strhalaman, numkodsplit, bookomdez, boonitro, tablngbah)

  local strtakkctx = '' -- contabktaoj[5] index 5 is hardcoded
  local strymyng = '' -- screen
  local strlover = '' -- brewed from "strhalaman" : "Suno" -> "suno"
  local strnolast = '' -- brewed from "strhalaman" : "Suno" -> "Sun"
  local strnolaslow = '' -- brewed from "strlover" : "Suno" -> "sun"
  local strkatroot = ''
  local strcatoton = ''
  local nummortyp = 0 -- 77 "M" or 78 "N" only
  local numdewsx = 0
  local numlasst = 0 -- last char of lemma or ZERO if not separated
  local numcauar = 0
  local numcaubr = 0
  local booindeedlow = false

  numdewsx = string.len (strhalaman)
  strlover = lfxcaseult(strhalaman,false,false)
  booindeedlow = (strlover==strhalaman)
  numlasst = 0 -- needed far below
  nummortyp = 77 -- "M"
  if (boonitro or (numkodsplit==3)) then
    nummortyp = 78 -- "N"
  end--if
  if (numkodsplit==3) then
    strnolast = string.sub (strhalaman,1,(numdewsx-1)) -- cut off last char
    strnolaslow = string.sub (strlover,1,(numdewsx-1)) -- cut off & lowercase
    numlasst = string.byte (strhalaman,numdewsx,numdewsx) -- needed far below
    if (booindeedlow) then
      strymyng = strnolast -- as-is lowercase
    else
      strymyng = '[[' .. strlover .. '|' .. strnolast .. ']]' -- link
    end--if
    strymyng = strymyng .. ' + [[-' .. string.char(numlasst) .. '|' .. string.char(numlasst) .. ']]'
    strkatroot = strnolaslow -- $S
  end--if
  if (numkodsplit==4) then
    if (booindeedlow) then
      strymyng = strhalaman -- as-is lowercase
    else
      strymyng = '[[' .. strlover .. '|' .. strhalaman .. ']]' -- link
    end--if
    strkatroot = strlover -- $B
  end--if

  if (bookomdez) then
    strtakkctx = lfget345nonil (0,false) -- pick main data string 5 hardco
    numcauar = string.len(strtakkctx) -- simple "strtakkctx" can be used twice
    if (numcauar>=2) then
      tablngbah["WC"] = nil -- no stupid word class here
      tablngbah["WU"] = nil -- no stupid word class here
      tablngbah["MT"] = string.char(nummortyp)
      tablngbah["FR"] = strkatroot
      strcatoton = lfinsertultim (strtakkctx,tablngbah)
      lffillkaton (strcatoton,true) -- YES main page -- "qtabktaoj"
      if (numlasst~=0) then
        tablngbah["MT"] = 'U' -- last letter is suffix "U"
        tablngbah["FR"] = '-' .. string.char(numlasst)
        strcatoton = lfinsertultim (strtakkctx,tablngbah)
        lffillkaton (strcatoton,false) -- NOT main page -- "qtabktaoj"
      end--if
    end--if (numcauar>=2) then
  end--if (bookomdez) then

  return strymyng

end--function lfsplitsi

------------------------------------------------------------------------

-- Local function LFSPLITZH

-- Perform the large letter split (5, "$H").

-- The lemma is split into single letters. This is most useful for but
-- not restricted to Chinese ones. Note that for this split the mortyp is
-- always "M" (standalone). Use manual split for other cases.

-- We need sub "lfinsertultim" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "bookomdoz" must be
-- false unless lng in "tablngbaih" is valid and known.

-- Names of the categories are built from "contabktaoj" index 5 (vorto).

-- Input  : * "strhilaman"     -- input lemma ie pagename
--          * "bookomdoz"      -- "true" if compound cat:s are desired at all
--          * "tablngbaih"     -- lng stuff ("??" legal but needs "bookomdez")
-- Output : * "strygyng"       -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s (index [20]...[35] main page status not used here).

-- This sub depends on "UTF8 FUNCTIONS"\"lfutf8length" and
-- "HIGH LEVEL FUNCTIONS"\"lfinsertultim" and
-- "HIGH LEVEL FUNCTIONS"\"lffillkaton" and
-- "HIGH LEVEL FUNCTIONS"\"lfget345nonil".

local function lfsplitzh (strhilaman, bookomdoz, tablngbaih)

  local strtookctj = '' -- contabktaoj[5] index 5 is hardcoded
  local strygyng = '' -- screen
  local strbeexess = ''
  local strcatatan = ''
  local numinwwlen = 0
  local numwwindex = 1 -- ONE-based
  local numwwchar = 0
  local numwwlen = 0

  numinwwlen = string.len(strhilaman)

  while (true) do -- genuine loop, counter is "numwwindex" step 1...4
    if (numwwindex>numinwwlen) then
      break -- done (risk of overflow)
    end--if
    numwwchar = string.byte (strhilaman,numwwindex,numwwindex)
    numwwlen = lfutf8length (numwwchar)
    if (numwwlen==0) then
      strygyng = strhilaman -- this is criminal
      break -- some compound cat:s may be left behind :-(
    end--if
    strbeexess = string.sub (strhilaman,numwwindex,(numwwindex+numwwlen-1))
    if (strygyng~='') then
      strygyng = strygyng .. ' + '
    end--if
    strygyng = strygyng .. '[[' .. strbeexess .. ']]'
    if (bookomdoz) then
      strtookctj = lfget345nonil (0,false) -- pick main data string 5 hardco
      numwwchar = string.len(strtookctj) -- this is large letter split
      if (numwwchar>=2) then
        tablngbaih["WC"] = nil -- no stupid word class here
        tablngbaih["WU"] = nil -- no stupid word class here
        tablngbaih["MT"] = 'M'
        tablngbaih["FR"] = strbeexess
        strcatatan = lfinsertultim (strtookctj,tablngbaih)
        lffillkaton (strcatatan,false) -- NOT main page -- "qtabktaoj"
      end--if (numwwchar>=2) then
    end--if (bookomdoz) then
    numwwindex = numwwindex + numwwlen -- step 1...4 risk of overflow
  end--while

  return strygyng

end--function lfsplitzh

------------------------------------------------------------------------

---- MAIN EXPORTED FUNCTION ----

------------------------------------------------------------------------

function splitter.ek (arxframent)

  -- general unknown type

  local vartmp = 0      -- variable without type

  -- special type "args" AKA "arx"

  local arxspecial = 0  -- metaized "args"

  -- general tab in from caller ("qtabktaoj" is elsewhere)

  local tabbluck        = {}  -- from "%"-syntax assi
  local tablynx         = {}  -- from "#"-syntax assi
  local tabmnfrags      = {}  -- for manual split
  local tabextfriig     = {}  -- from extra parameter
  local tablngdbl       = {}  -- double-letter indexes

  -- general str ("qstrtrace" is elsewhere)

  local strkaatctl = ''  -- picked from "contabktaoj" via "lfget345nonil"
  local strlemmain = ''  -- lemma in
  local strlemmaut = ''  -- bold lemma (maybe split) out
  local strtmp     = ''  -- temp

  -- general num

  local numsplyt   = 0 -- split strategy (0 auto 1 assi auto 2 manu 7 none)

  local numtamp   = 0
  local numoct    = 0
  local numodt    = 0
  local numlindex = 0

  -- general boo from caller

  local boocatdesir = false
  local booexteval  = false  -- true if we got the extra parameter
  local boohavnyrr  = false  -- true if we got "NR"
  local boohavkall  = false  -- true if we got "KA"

  -- general boo

  local booerr    = false
  local bootrace  = false  -- hardcoded

  ---- ASSIGN AND BOAST ----

  qstrtrace = '<br>This is "msplitter" submodule.' -- unconditional

  ---- GET THE ARX ----

  arxspecial = arxframent.args
  while (true) do -- fake loop
    if (type(arxspecial)~="table") then
      booerr = true
      break
    end--if
    boocatdesir = arxspecial[ 0]
    strlemmain  = arxspecial[ 1]
    numsplyt    = arxspecial[ 2]
    if ((type(boocatdesir)~="boolean") or (type(strlemmain)~="string") or (type(numsplyt)~="number")) then
      if (bootrace) then
        lftracemsg ('Index 0...2 bad data type') -- "qstrtrace"
      end--if
      booerr = true
      break
    end--if
    tabbluck    = arxspecial[ 3]
    tablynx     = arxspecial[ 4]
    tabmnfrags  = arxspecial[ 5]
    tabextfriig = arxspecial[ 6]
    booexteval  = arxspecial[ 7] -- boolean between tables !!!
    tablngdbl   = arxspecial[ 8]
    boohavnyrr  = arxspecial[ 9] -- NR
    boohavkall  = arxspecial[10] -- KA
    if ((type(booexteval)~="boolean") or (type(tablngdbl)~="table") or (type(boohavnyrr)~="boolean") or (type(boohavkall)~="boolean")) then
      if (bootrace) then
        lftracemsg ('Index 7...10 bad data type') -- "qstrtrace"
      end--if
      booerr = true
    end--if
    break
  end--while -- fake loop

  ---- SPLIT THE LEMMA IF NEEDED ----

  -- process from "strlemmain" (sudah guaranteed to be
  -- non-empty) to "strlemmaut" (actually NOT for manual split)

  -- "numsplyt" : 0 auto 1 assi auto 2 manu 3 srs 4 sbr 5 zh 7 none

  -- we skip the split and copy only if:
  -- * "numsplyt" is 7 (#S7 no split)

  -- punctuation (5 char:s: ! , . ; ?) 21 33 | 2C 44 | 2E 46 | 3B 59 | 3F 63
  -- dash "-" and apo "'" do NOT count as punctuation (for auto and assi auto)

  -- we depend on "boocatdesir" (they can switch off some cat:s)
  -- we depend on "boohavkall" (switches between "vortgrupo" and "frazo")

  -- "qtabktaoj" is very global
  -- 0...17 cat names without "Category:" prefix, unused "nil"
  -- 20...37 "true" if main page, otherwise "nil"
  -- "lfsplitaa" and "lfsplitmn" and "lfsplitsi" and "lfsplitzh" will
  -- fill it and below more from extra parameter

  if (booerr==false) then
    if (numsplyt<2) then -- ZERO or ONE -> auto or assi auto #S0 #S1
      strlemmaut = lfsplitaa (strlemmain, tabbluck, tablynx, boocatdesir, boohavkall, tablngdbl)
    end--if
    if (numsplyt==2) then -- 2 -> manu #S2
      strlemmaut = lfsplitmn (tabmnfrags, boocatdesir, boohavkall, tablngdbl)
    end--if
    if ((numsplyt==3) or (numsplyt==4)) then -- 3 4 -> simple #S3 #S4
      strlemmaut = lfsplitsi (strlemmain, numsplyt, boocatdesir, boohavnyrr, tablngdbl)
    end--if
    if (numsplyt==5) then -- 5 -> zh #S5
      strlemmaut = lfsplitzh (strlemmain, boocatdesir, tablngdbl)
    end--if
    if (numsplyt==7) then -- 7 -> no split #S7
      strlemmaut = strlemmain -- no split, "strlemmaut" needed for visible part
    end--if
  end--if

  ---- BREW UP TO 4 EXTRA CATEGORIES ----

  -- from extra parameter sent to us in "tabextfriig" and "booexteval"

  -- with "booexteval" true prevalidated morphemes are be in
  -- "tabextfriig" incl prefix fe "C:" or "M!", the caller
  -- converts possible "&"-syntax to 1 or 2 fragments

  -- with "booexteval" false the extra parameter was empty and
  -- we do nothing here

  if ((booerr==false) and boocatdesir and booexteval) then
    numlindex = 0
    while (true) do
      vartmp = tabextfriig[numlindex] -- risk of type "nil"
      if (type(vartmp)=="string") then
        numoct = string.byte(vartmp,1,1) -- C I M N P U W
        numodt = string.byte(vartmp,2,2) -- ":" 58 or "!" 33
        numtamp = string.len (vartmp)
        strtmp = string.sub (vartmp,3,numtamp) -- prevalidated morpheme string
        strkaatctl = lfget345nonil (numoct,boohavkall) -- pick main data str
        numtamp = string.len(strkaatctl) -- this is main brewing 4 extra cat:s
        if (numtamp>=2) then
          bootimp = lffinditems(strkaatctl,"MT") -- need it ??
          tablngdbl["WC"] = nil -- no stupid word class here
          tablngdbl["WU"] = nil -- no stupid word class here
          if (bootimp) then
            tablngdbl["MT"] = string.char(numoct) -- morpheme type
          else
            tablngdbl["MT"] = nil -- no morpheme type here
          end--if
          tablngdbl["FR"] = strtmp
          strtmp = lfinsertultim (strkaatctl,tablngdbl)
          lffillkaton (strtmp,(numodt==33)) -- MAYBE main page -- "qtabktaoj"
        end--if (numtamp>=2) then
      else
        break -- abort at "nil"
      end--if (type(vartmp)=="string") else
      numlindex = numlindex + 1
    end--while
  end--if

  ---- PREPARE RETURN ----

  if (booerr) then
    strlemmaut = "//"
  end--if
  qtabktaoj [40] = strlemmaut -- unconditionally
  qtabktaoj [41] = qstrtrace -- unconditionally, cannot be empty

  ---- RETURN THE RESULT TABLE ----

  return qtabktaoj

end--function

  ---- RETURN THE JUNK LUA TABLE ----

return splitter