Discussion:
Lua in-kernel (lbuf library)
Lourival Vieira Neto
2013-10-10 18:15:54 UTC
Permalink
Hi folks,

It has been a long time since my GSoC project and though I have tried
to come back, I've experienced some personal issues. However, now I'm
coding again.

I'm developing a library to handle buffers in Lua, named lbuf. It is
been developed as part of my efforts to perform experimentation in
kernel network stack using Lua. Initially, I intended to bind mbuf to
allow, for example, to write protocols dissectors in Lua. For example,
calling a Lua function to inspect network packets:

function filter(packet)
if packet.field == value then return DROP end
return PASS
end

Thus, I started to design a Lua binding to mbuf inspired by '#pragma
pack' and bitfields of C lang. Then, I realized that this Lua library
could be useful to other kernel (and user-space) areas, such as device
drivers and user-level protocols. So, I started to develop this
binding generically as a independent library to give random access to
bits in a buffer. It is just in the early beginning, but I want to
share some thoughts.

Here are a draft of the lbuf API:

C API:

lbuf_new(lua_State L, void * buffer, size_t length, lua_Alloc free, bool net);

* creates a new lbuf userdatum and pushes it on the Lua stack. The net
flag indicates if it is necessary to perform endianness conversion.

Lua API:

- array access (1)

lbuf:mask(alignment [, offset, length])
buf[ix] ~> accesses 'alignment' bits from 'alignment*(ix -1)+offset' position

e.g.:
buf:mask(3)
buf[3] ~> accesses 3 bits from bit-6 position

- array access (2)

buf:mask{ length_pos1, length_pos2, ... }
buf[ix] ~> accesses 'length_pos(ix)' bits from 'length_pos1 + ...
length_pos(ix-1)' position

e.g.:
buf:mask{ 2, 2, 32, 9 }
buf[2] ~> accesses 2 bits from bit-2 position

- fields access

buf:mask{ field = { offset, length }, ... }
buf.field ~> 'field.length' bits from 'offset' position

e.g.:
buf:mask{
type = { 0, 2 },
-- 1 bit padding
flag = { 4, 1 },
xyz = { 15, 17 },
seg = {
flagX = { 32, 1 },
flagY = { 33, 1 },
flagZ = { 34, 1 },
}
}
buf.flag ~> 1 bit from bit-4 position
buf.xyz ~> 17 bits from bit-15 position
buf.seg.flagY ~> 1 bit from bit-34 position

- raw access

buf:rawget(3, 30) ~> gets 30 bits from bit-3 position
buf:rawset(3, 30, value) <~ sets 'value' into 30 bits from bit-3 position

- segment

buf:segment(offset [, length])

returns a new lbuf corresponding a 'buf' segment.

- mask reusing

lbuf.mask{ ... }

creates a mask without associating a specific buffer. Thus, you can
call buf:mask() passing a already created mask. For example:

ethernet_mask = lbuf.mask{ type = { ethertype_offset, ethertype_len }}
lldp_mask = lbuf.mask{ version = { version_offset, version_len }}

function filter(packet)
packet:mask(ethernet_mask)
if packet.type == 0x88CC then
lldp_pdu = packet.segment(payload_offset):mask(lldp_mask)
if packet.version < 1 return DROP end
end
return PASS
end

The code is hosted in https://github.com/lneto/lbuf. Currently, only
array and raw access are working (partially).

I think this API could be useful for device-driver and protocol
prototyping. Looking forward to hearing from you.

Regards,
--
Lourival Vieira Neto
Christoph Badura
2013-10-14 13:02:36 UTC
Permalink
First, I find the usage of the "buf" terminology confusing. In kernel
context I associate "buf" with the file system buffe cache "buf" structure.
Packet buffers a called "mbufs". I would appreciate it if the terminology
was consistent with the kernel or at least not confusing.

Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
Post by Lourival Vieira Neto
lbuf_new(lua_State L, void * buffer, size_t length, lua_Alloc free, bool net);
* creates a new lbuf userdatum and pushes it on the Lua stack. The net
flag indicates if it is necessary to perform endianness conversion.
I what is "buffer" and how does it relate to mbufs? How do I create a new
"lbuf" from an mbuf? Or from an array of bytes?

In order to indicate that endianness conversion is necessary I need to
know the future uses of the buffer. Clairvoyance excepted, that is kinda
hard.

If you are going to make the buffers endianness aware, why not record the
endianness that the packet is encoded in. And byteswapping can be
performed automatically depending on the consumers endianness. I think
this way a lot of redundant code can be avoided.

And you don't describe under what circumstances endianness convresion is
performed.
Post by Lourival Vieira Neto
- array access (1)
lbuf:mask(alignment [, offset, length])
buf[ix] ~> accesses 'alignment' bits from 'alignment*(ix -1)+offset' position
buf:mask(3)
buf[3] ~> accesses 3 bits from bit-6 position
What does that mean? Does it return the top-most 2 bits from the first
byte plus the least significant bit fom the second byte of the buffer?
What is 'length' for?
How does endianness conversion fit in?
Post by Lourival Vieira Neto
- array access (2)
buf:mask{ length_pos1, length_pos2, ... }
buf[ix] ~> accesses 'length_pos(ix)' bits from 'length_pos1 + ...
length_pos(ix-1)' position
buf:mask{ 2, 2, 32, 9 }
buf[2] ~> accesses 2 bits from bit-2 position
What exactly would "buf[3]" return. Please be explicit in whether you are
counting byte offsets or bit offsets. I can't figure that out from your
description.

Personally, the idea of making array access to the buffer depend on
state stored in the buffer does not look appealing to me. It prevents
buffers to be passed around because consumers don't know what they will
get back on array access.
Post by Lourival Vieira Neto
buf:mask{ field = { offset, length }, ... }
buf.field ~> 'field.length' bits from 'offset' position
This actually makes some sense to me.
Post by Lourival Vieira Neto
buf:segment(offset [, length])
returns a new lbuf corresponding a 'buf' segment.
What is a a 'segment' actually?
Post by Lourival Vieira Neto
- mask reusing
lbuf.mask{ ... }
This makes sense again...
Post by Lourival Vieira Neto
function filter(packet)
packet:mask(ethernet_mask)
if packet.type == 0x88CC then
lldp_pdu = packet.segment(payload_offset):mask(lldp_mask)
if packet.version < 1 return DROP end
end
return PASS
end
... except the code seems to be not runnable. Where does 'payload_offset'
come from? And don't you mean lldp_pdu.version?

I find it not helpful when the examples do not actually work.

--chris
Lourival Vieira Neto
2013-10-15 21:01:29 UTC
Permalink
Hi Christoph,
Post by Christoph Badura
First, I find the usage of the "buf" terminology confusing. In kernel
context I associate "buf" with the file system buffe cache "buf" structure.
Packet buffers a called "mbufs". I would appreciate it if the terminology
was consistent with the kernel or at least not confusing.
This is due my lack of creativeness =).. I'm quite open for naming suggestions.
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
Post by Christoph Badura
Post by Lourival Vieira Neto
lbuf_new(lua_State L, void * buffer, size_t length, lua_Alloc free, bool net);
* creates a new lbuf userdatum and pushes it on the Lua stack. The net
flag indicates if it is necessary to perform endianness conversion.
I what is "buffer" and how does it relate to mbufs? How do I create a new
"lbuf" from an mbuf? Or from an array of bytes?
Note, non-contiguous buffer still an open problem in lbuf. I don't
know if should use a ptrdiff_t to pass the distance to 'next' field,
a 'next()' to return 'next' field or something else.

However, you could create a lbuf from a mbuf header as follows:

lbuf_new(L, mbuf->m_data, mbuf->m_len, NULL, true);

or from an array:

uint8_t array[ N ];
lbuf_new(L, (void *) array, N, NULL, false); // 'false' means 'use
the platform endianess'

Then, you could call a Lua function passing this lbuf, for example:

lua_getglobal(L, "handler");
lbuf_new(L, mbuf->m_data, mbuf->m_len, NULL, true);
lua_pcall(L, 1, 0, 0);
Post by Christoph Badura
In order to indicate that endianness conversion is necessary I need to
know the future uses of the buffer. Clairvoyance excepted, that is kinda
hard.
It's a generic data structure that could be used to handle bit fields
or nonaligned data.
Post by Christoph Badura
If you are going to make the buffers endianness aware, why not record the
endianness that the packet is encoded in. And byteswapping can be
performed automatically depending on the consumers endianness. I think
this way a lot of redundant code can be avoided.
And you don't describe under what circumstances endianness convresion is
performed.
Yes, mea culpa =(. I wasn't clear about that. 'net' flag was the way I
found to 'record' the buffer endianness. What means, true if the
buffer uses BE and false if it uses HE. It has the same semantics of
hton* and ntoh* functions. Don't know if it is better to pass the
endianness itself as a flag (e.g., enum { BIG_ENDIAN, LITTLE_ENDIAN,
HOST_ENDIAN }). What do you think?

So, if you set net flag true when you access a bit field, the
conversion to and from big endian, if needed, is done automatically
taking the smaller aligned set of bits. For example:

buf:rawget(0, 9) ~> if net flag is *true*: takes 16 bits from
beginning of the buffer (as is); convert these 2 bytes from BE to HE
(if necessary); and returns these 2 bytes masked to preserve only the
most significant 9 bits (zeroing the remaining bits) and shifted to
LSB. If net is *false*: just returns the first 2 bytes masked and
shifted (without conversion). Then these 2 bytes are expanded to
lua_Number type (int64_t in kernel)

That is:

a) If net flag is _true_ and the platform is LE:
1- Takes 16 bits:
[ b0 | b1 | b2 | b3 | b4 | b5 | b6 | b7 ][ b8 | b9 | b10 |
b11 | b12 | b13 | b14 | b15 ]

2- Convert it to LE:
[ b8 | b9 | b10 | b11 | b12 | b13 | b14 | b15 ][ b0 | b1 | b2 |
b3 | b4 | b5 | b6 | b7 ]

3- Returns the first 2 bytes masked and shifted:
[ b1 | b2 | b3 | b4 | b5 | b6 | b7 | b8 ][ 0 | 0 | 0 | 0 | 0 | 0 | 0 | b0 ]

b) If net flag is _false_ and the platform is LE:
1- Takes 16 bits:
[ b0 | b1 | b2 | b3 | b4 | b5 | b6 | b7 ][ b8 | b9 | b10 |
b11 | b12 | b13 | b14 | b15 ]

2- Returns the first 2 bytes masked and shifted:
[ b9 | b10 | b11 | b12 | b13 | b14 | b15 | b0 ][ 0 | 0 | 0 | 0
| 0 | 0 | 0 | b8 ]

c) If net flag is _true or false_ and platform is BE:
1- Takes 16 bits:
[ b0 | b1 | b2 | b3 | b4 | b5 | b6 | b7 ][ b8 | b9 | b10 |
b11 | b12 | b13 | b14 | b15 ]

2- Returns the first 2 bytes masked and shifted:
[ 0 | 0 | 0 | 0 | 0 | 0 | 0 | b0 ][ b1 | b2 | b3 | b4 |
b5 | b6 | b7 | b8 ]
Post by Christoph Badura
Post by Lourival Vieira Neto
- array access (1)
lbuf:mask(alignment [, offset, length])
buf[ix] ~> accesses 'alignment' bits from 'alignment*(ix -1)+offset' position
buf:mask(3)
buf[3] ~> accesses 3 bits from bit-6 position
What does that mean? Does it return the top-most 2 bits from the first
byte plus the least significant bit fom the second byte of the buffer?
It means the least-most 2 bits from the first byte and the LSB from the second.
Post by Christoph Badura
What is 'length' for?
Offset and length could be used to impose boundaries to the mask. For
example, if you want to analyse a segment of the buffer that is
organized in a logical array of 2 bytes starting from the second byte
and that has 3 elements, you could do: buf:mask(16, 8, 3).
Post by Christoph Badura
How does endianness conversion fit in?
Endianness conversion is done using the smaller aligned amount of
bits; in this case, 1 byte, which does not applies to endianness.
Post by Christoph Badura
Post by Lourival Vieira Neto
- array access (2)
buf:mask{ length_pos1, length_pos2, ... }
buf[ix] ~> accesses 'length_pos(ix)' bits from 'length_pos1 + ...
length_pos(ix-1)' position
buf:mask{ 2, 2, 32, 9 }
buf[2] ~> accesses 2 bits from bit-2 position
What exactly would "buf[3]" return. Please be explicit in whether you are
counting byte offsets or bit offsets. I can't figure that out from your
description.
It would return 32 bits (converted or not, depending on 'net' flag)
from bit-4 (MSB-ordered).
mask{ ... } receives bit offsets and array access receives mask field
index. I'm always counting bit offsets. Bytes are only used to
endianness conversion.
Post by Christoph Badura
Personally, the idea of making array access to the buffer depend on
state stored in the buffer does not look appealing to me. It prevents
buffers to be passed around because consumers don't know what they will
get back on array access.
I think it could be useful to access nonaligned and aligned data
easily without caring about naming fields.
Post by Christoph Badura
Post by Lourival Vieira Neto
buf:mask{ field = { offset, length }, ... }
buf.field ~> 'field.length' bits from 'offset' position
This actually makes some sense to me.
=)
Post by Christoph Badura
Post by Lourival Vieira Neto
buf:segment(offset [, length])
returns a new lbuf corresponding a 'buf' segment.
What is a a 'segment' actually?
Segment is a sub-buffer. You could use just a portion of a main buffer
with another mask (e.g., to dissect a payload).
Post by Christoph Badura
Post by Lourival Vieira Neto
- mask reusing
lbuf.mask{ ... }
This makes sense again...
=)
Post by Christoph Badura
Post by Lourival Vieira Neto
function filter(packet)
packet:mask(ethernet_mask)
if packet.type == 0x88CC then
lldp_pdu = packet.segment(payload_offset):mask(lldp_mask)
if packet.version < 1 return DROP end
end
return PASS
end
... except the code seems to be not runnable. Where does 'payload_offset'
come from?
It's a variable which could be set by the script itself or loaded by
the C module. I could use the value itself (like 0x88CC), but I just
wanted to save the time of reading the standard.
Post by Christoph Badura
And don't you mean lldp_pdu.version?
Yes, sorry about that.
Post by Christoph Badura
I find it not helpful when the examples do not actually work.
Well, the library it is not ready yet. This example is just a draft to
discuss a concept. However, this fragment should be runnable when the
lib is implemented (except by packet.version mistake).
Post by Christoph Badura
--chris
Regards,
--
Lourival Vieira Neto
Marc Balmer
2013-10-16 06:50:05 UTC
Permalink
Am 15.10.13 23:01, schrieb Lourival Vieira Neto:

[...]
Post by Lourival Vieira Neto
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
In C an array index is actually an offset from the top, so 0 is the
natural way to denote element nr. 1 in C. In Lua, a numeric array index
is not an offset, but the ordinal array position. So 1 is the natural
way to denote the first element.

Strictly speaking, it's actually C that is weird: Index n denotes array
element n + 1...

Following the principle of least astonishment, I would not recommend
starting to do 0 based stuff in Lua, a Lua programmer certainly expects
things to start at 1.

[...]
Lourival Vieira Neto
2013-10-16 14:41:58 UTC
Permalink
Post by Marc Balmer
[...]
Post by Lourival Vieira Neto
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
In C an array index is actually an offset from the top, so 0 is the
natural way to denote element nr. 1 in C. In Lua, a numeric array index
is not an offset, but the ordinal array position. So 1 is the natural
way to denote the first element.
Strictly speaking, it's actually C that is weird: Index n denotes array
element n + 1...
Following the principle of least astonishment, I would not recommend
starting to do 0 based stuff in Lua, a Lua programmer certainly expects
things to start at 1.
[...]
Indeed.
--
Lourival Vieira Neto
Aleksej Saushev
2013-10-16 16:56:18 UTC
Permalink
Post by Marc Balmer
Post by Lourival Vieira Neto
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
In C an array index is actually an offset from the top, so 0 is the
natural way to denote element nr. 1 in C. In Lua, a numeric array index
is not an offset, but the ordinal array position. So 1 is the natural
way to denote the first element.
Strictly speaking, it's actually C that is weird: Index n denotes array
element n + 1...
This depends on your background. If you studied or dealt with mathematical logic,
set theory, or foundations of mathematics, then you start counting natural numbers
(ordinals) from 0. In this respect C is more logical (pun intended) than Lua.
Post by Marc Balmer
Following the principle of least astonishment, I would not recommend
starting to do 0 based stuff in Lua, a Lua programmer certainly expects
things to start at 1.
It is hard to tell what is the least astonishing here. You propose Lua
as a language embedded into C rather than separate one. I'd say that Lua
designers made wrong decision here.
--
BCE HA MOPE!
Mouse
2013-10-16 17:18:18 UTC
Permalink
Post by Aleksej Saushev
[...0-origin vs 1-origin arrays...]
It is hard to tell what is the least astonishing here.
Well, least astonishing to whom, is really the question, it seems to
me. Certainly I, as a C coder with no Lua experience, would find
0-origin arrays less astonishing. Someone with the converse experience
would presumably have the opposite reaction.
Post by Aleksej Saushev
You propose Lua as a language embedded into C rather than separate
one. I'd say that Lua designers made wrong decision here.
Only if you think of Lua as being designed for embedding in C. It's
just as coherent to think of the mistake as being trying to wed a
language with 1-origin arrays with a language with 0-origin arrays.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Marc Balmer
2013-10-16 18:25:13 UTC
Permalink
Post by Mouse
Post by Aleksej Saushev
[...0-origin vs 1-origin arrays...]
It is hard to tell what is the least astonishing here.
Well, least astonishing to whom, is really the question, it seems to
That is not a question. Least astonishing to Lua programmers, of course.

C coders like you are certainly not the target audience of Lua in
NetBSD. Lua is there to make it easy to explore NetBSD from a scripting
environment, amongst other uses.
Post by Mouse
me. Certainly I, as a C coder with no Lua experience, would find
0-origin arrays less astonishing. Someone with the converse experience
would presumably have the opposite reaction.
Post by Aleksej Saushev
You propose Lua as a language embedded into C rather than separate
one. I'd say that Lua designers made wrong decision here.
Only if you think of Lua as being designed for embedding in C. It's
just as coherent to think of the mistake as being trying to wed a
language with 1-origin arrays with a language with 0-origin arraay.
Lua has in fact been designed to be embedded. That is what it is all
about. That is why Lua is a library only.
Post by Mouse
/~\ The ASCII Mouse
\ / Ribbon Campaign
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Lourival Vieira Neto
2013-10-17 01:10:31 UTC
Permalink
Post by Mouse
Post by Aleksej Saushev
[...0-origin vs 1-origin arrays...]
(...)
Post by Aleksej Saushev
You propose Lua as a language embedded into C rather than separate
one. I'd say that Lua designers made wrong decision here.
Only if you think of Lua as being designed for embedding in C. It's
just as coherent to think of the mistake as being trying to wed a
language with 1-origin arrays with a language with 0-origin arrays.
I don't think it is a bad design. You just has to use a different
abstraction set when using the embedded language. If you are planning
to use the extension language with the same mindset you use with the
system language, then you are more susceptible to make bad designing
decisions (IMHO). Lua tables don't intend to bind C arrays at all. If
you want to use C arrays, I think, you should use C arrays (not
something else).
--
Lourival Vieira Neto
Alan Barrett
2013-10-16 10:56:29 UTC
Permalink
Post by Lourival Vieira Neto
Post by Christoph Badura
In order to indicate that endianness conversion is necessary I need to
know the future uses of the buffer. Clairvoyance excepted, that is kinda
hard.
It's a generic data structure that could be used to handle bit fields
or nonaligned data.
Endianness should be a property of a field in a data structure,
not a property of the entire data structure. There might be a
mixture of big- and little-endian fields of different sizes in the
same data structure. It seemed to me from a superficial reading
that the proposed endianness flag in the Lua "buf" interface would
not handle that.

--apb (Alan Barrett)
Lourival Vieira Neto
2013-10-16 18:34:23 UTC
Permalink
Post by Lourival Vieira Neto
Post by Christoph Badura
In order to indicate that endianness conversion is necessary I need to
know the future uses of the buffer. Clairvoyance excepted, that is kinda
hard.
It's a generic data structure that could be used to handle bit fields
or nonaligned data.
Endianness should be a property of a field in a data structure, not a
property of the entire data structure. There might be a mixture of big- and
little-endian fields of different sizes in the same data structure. It
seemed to me from a superficial reading that the proposed endianness flag in
the Lua "buf" interface would not handle that.
--apb (Alan Barrett)
I thought to use an optional parameter in mask fields to allow local
decision. Thus, you could have a global behavior defined in the lbuf
userdatum creation and a per field behavior defined in the mask field.
For example:

lbuf.mask{ field = { offset, length, net }, ... }

Also, I was thinking about have a signedness flag (both global, in
buffer creation, and local, in mask field).

Other syntax that I'm considering is to also have named parameters
(optionally) in mask declaration:

lbuf.mask{ field = { __offset = offset, __length = length, __net =
net, __signed = signed }, ... }

Regards,
--
Lourival Vieira Neto
Christoph Badura
2013-10-20 22:05:58 UTC
Permalink
Post by Lourival Vieira Neto
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
When you create your own data structures, I guess it is a wash. You have
to adjust +/-1 in infrequent circumstances in either scenario.

But in this case you are creating a special purpose language that operates
in universe of zero-based array. And that's not only the kernel code.
Every Internet protocol specification that I remember is using zero-based
indexing. For someone dealing with both sides (the world and your lua
library), it makes the difference between constantly having to be alert
to remember to do the offset adjustment. That is a lot more mental work
for anyone working with this library.

If you use 1-based indices talking to protocol people will be funny too:

``Anyone know why the flags in byte 6 of this packet are funny?''
``Sure, that's most likely because the flags are in byte 5.''

I think it is worth thinking hard about this.
Post by Lourival Vieira Neto
From a cursory reading of the Wireshark Lua API, it seems to me they are
using 0-based indices too.
Post by Lourival Vieira Neto
Post by Christoph Badura
I what is "buffer" and how does it relate to mbufs?
Note, non-contiguous buffer still an open problem in lbuf. I don't
know if should use a ptrdiff_t to pass the distance to 'next' field,
a 'next()' to return 'next' field or something else.
You seem to be talking about implementation. I was talking about the
interface of the library.
Post by Lourival Vieira Neto
lbuf_new(L, mbuf->m_data, mbuf->m_len, NULL, true);
I don't think that is a good way. You say you want to inspect packet
data in the kernel. Well, the packet's data can be spread over a
chain of mbufs. Also, mbufs may have internal or externel storage.
You don't want to deal with that as the user of your library.

As a user, I want an interface like this:
lbuf_from_mbuf(L, mbuf, NULL, true);

That would make the contents of the mbuf chain starting at "mbuf"
available in an array-of-bytes like fashion. Length isn't needed as
it is computed from the mbuf chain.
Post by Lourival Vieira Neto
Yes, mea culpa =(. I wasn't clear about that. 'net' flag was the way I
found to 'record' the buffer endianness. What means, true if the
buffer uses BE and false if it uses HE. It has the same semantics of
hton* and ntoh* functions. Don't know if it is better to pass the
endianness itself as a flag (e.g., enum { BIG_ENDIAN, LITTLE_ENDIAN,
HOST_ENDIAN }). What do you think?
For me the the most convenient interface would be if I didn't have to
mention the host byteorder. Just record what byteorder the buffer is in,
and convert when appropriate. Alan made a good point. It maybe be
convenient and/or necessary to specify a different byteorder in a mask.
Post by Lourival Vieira Neto
buf:rawget(0, 9) ~> if net flag is *true*: takes 16 bits from
You know, I think rawget is badly named. "Raw" implies unmodified.
And byteswapping is a form of modification. Maybe you can find a better
name.
Post by Lourival Vieira Neto
Post by Christoph Badura
Post by Lourival Vieira Neto
buf:mask(3)
buf[3] ~> accesses 3 bits from bit-6 position
What does that mean? Does it return the top-most 2 bits from the first
byte plus the least significant bit fom the second byte of the buffer?
It means the least-most 2 bits from the first byte and the LSB from the second.
I don't know what "least-most" means. But since you don't seem to
agree with my questions I assume you intend that to be the opposite of
"top-most" then your statement doesn't make sense to me. Translated
into normal terms it would return bits 0 and 1 of the first byte and bit
0 of the second one. They are not even contigous.
Post by Lourival Vieira Neto
Post by Christoph Badura
What is 'length' for?
Offset and length could be used to impose boundaries to the mask.
Don't tell it to me. Write it into the documentation. :-)
Post by Lourival Vieira Neto
Post by Christoph Badura
Post by Lourival Vieira Neto
buf:mask{ 2, 2, 32, 9 }
buf[2] ~> accesses 2 bits from bit-2 position
What exactly would "buf[3]" return. Please be explicit in whether you are
counting byte offsets or bit offsets. I can't figure that out from your
description.
It would return 32 bits (converted or not, depending on 'net' flag)
from bit-4 (MSB-ordered).
From "bit-4"? From your own use of 1-based indices don't you mean the
5th bit?
Post by Lourival Vieira Neto
Post by Christoph Badura
Personally, the idea of making array access to the buffer depend on
state stored in the buffer does not look appealing to me. It prevents
buffers to be passed around because consumers don't know what they will
get back on array access.
I think it could be useful to access nonaligned and aligned data
easily without caring about naming fields.
I guess I did not make myself understandable. Storing that state globally
means that every function you hand an lbuf too has to set the global
mask before it accesses the lbuf, because it can't know if the current
mask is compatible with its own use. And after call to a function, the
caller has to reset the mask, because it could have been changed.

so you end up with code like:

function filter1(packet)
packet:mask(whatever)
...
filter2(packet)
packet:mask(whatever) -- restore our mask
...
filter3(packet)
packet:mask(whatever) -- restore our mask
...
end

function filter2(packet)
packet:mask(something_else)
...
filter4(packet)
packet:mask(something_else)
...
end

function filter3(packet)
packet:mask(entirely_different)
...
end

That, of course, is completely idiotic.

The newbuf = buf:mask() functionalty completely avoids that problem and is
sufficient. I would provide only that interface.

--chris
Marc Balmer
2013-10-24 07:41:38 UTC
Permalink
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
When you create your own data structures, I guess it is a wash. You have
to adjust +/-1 in infrequent circumstances in either scenario.
But in this case you are creating a special purpose language that operates
in universe of zero-based array. And that's not only the kernel code.
Every Internet protocol specification that I remember is using zero-based
indexing. For someone dealing with both sides (the world and your lua
library), it makes the difference between constantly having to be alert
to remember to do the offset adjustment. That is a lot more mental work
for anyone working with this library.
``Anyone know why the flags in byte 6 of this packet are funny?''
``Sure, that's most likely because the flags are in byte 5.''
I think it is worth thinking hard about this.
From a cursory reading of the Wireshark Lua API, it seems to me they are
using 0-based indices too.
It probably depends whether you are access a Lua table (where you'd
expect 1-based) or if you are accessin an in-kernel datastructure that
is not strictly a Lua table. If a Lua function is to mimick an existing
C function, it might be better indeed to use 0-based access, to not
confuse developers who are probably familiar with the corresponding C
function.

As this is mostly a matter of taste/source of confusion, it should be
documented. A man page for a Lua function that offers the same or
similar functionality as a C function should state whether it's 0-based
or 1-based, imo.

[...]
Lourival Vieira Neto
2013-11-04 20:26:15 UTC
Permalink
Post by Marc Balmer
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
When you create your own data structures, I guess it is a wash. You have
to adjust +/-1 in infrequent circumstances in either scenario.
But in this case you are creating a special purpose language that operates
in universe of zero-based array. And that's not only the kernel code.
Every Internet protocol specification that I remember is using zero-based
indexing. For someone dealing with both sides (the world and your lua
library), it makes the difference between constantly having to be alert
to remember to do the offset adjustment. That is a lot more mental work
for anyone working with this library.
``Anyone know why the flags in byte 6 of this packet are funny?''
``Sure, that's most likely because the flags are in byte 5.''
I think it is worth thinking hard about this.
From a cursory reading of the Wireshark Lua API, it seems to me they are
using 0-based indices too.
It probably depends whether you are access a Lua table (where you'd
expect 1-based) or if you are accessin an in-kernel datastructure that
is not strictly a Lua table. If a Lua function is to mimick an existing
C function, it might be better indeed to use 0-based access, to not
confuse developers who are probably familiar with the corresponding C
function.
As this is mostly a matter of taste/source of confusion, it should be
documented. A man page for a Lua function that offers the same or
similar functionality as a C function should state whether it's 0-based
or 1-based, imo.
Indeed.
--
Lourival Vieira Neto
Lourival Vieira Neto
2013-11-04 20:13:40 UTC
Permalink
Hi Christoph,

Firstly, thanks for your comments. I really appreciated that =).

BTW, I renamed lbuf to Lua bitwiser and removed support for "unamed"
array access (buf:mask(alignment [, offset, length]) and buf:mask{
length_pos1, length_pos2, ... }). Instead, I introduced a new way to
provide array access (see below).
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
It's something that doesn't bug me so much.. But, if necessary it
could be changed to 0-based in this userdata.
When you create your own data structures, I guess it is a wash. You have
to adjust +/-1 in infrequent circumstances in either scenario.
But in this case you are creating a special purpose language that operates
in universe of zero-based array. And that's not only the kernel code.
Every Internet protocol specification that I remember is using zero-based
indexing. For someone dealing with both sides (the world and your lua
library), it makes the difference between constantly having to be alert
to remember to do the offset adjustment. That is a lot more mental work
for anyone working with this library.
``Anyone know why the flags in byte 6 of this packet are funny?''
``Sure, that's most likely because the flags are in byte 5.''
I think it is worth thinking hard about this.
Yes.. I'll keep think on this. By now, I'm using 1-based indices to
Lua array-access on buffers and 0-based offsets to bitmask
definitions.
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
I what is "buffer" and how does it relate to mbufs?
Note, non-contiguous buffer still an open problem in lbuf. I don't
know if should use a ptrdiff_t to pass the distance to 'next' field,
a 'next()' to return 'next' field or something else.
You seem to be talking about implementation. I was talking about the
interface of the library.
Yes, sorry. In that stage, sometimes, it is a little difficult to
separate these things.
Post by Christoph Badura
Post by Lourival Vieira Neto
lbuf_new(L, mbuf->m_data, mbuf->m_len, NULL, true);
I don't think that is a good way. You say you want to inspect packet
data in the kernel. Well, the packet's data can be spread over a
chain of mbufs. Also, mbufs may have internal or externel storage.
You don't want to deal with that as the user of your library.
lbuf_from_mbuf(L, mbuf, NULL, true);
I'm thinking in providing an interface like this in an adapter layer.
Thus, we could use Lua bitwiser library in other areas. I was just
giving an example of how to use it with the current implementation
(again, sorry for don't separate interface discussion from
implementation).
Post by Christoph Badura
Post by Lourival Vieira Neto
Yes, mea culpa =(. I wasn't clear about that. 'net' flag was the way I
found to 'record' the buffer endianness. What means, true if the
buffer uses BE and false if it uses HE. It has the same semantics of
hton* and ntoh* functions. Don't know if it is better to pass the
endianness itself as a flag (e.g., enum { BIG_ENDIAN, LITTLE_ENDIAN,
HOST_ENDIAN }). What do you think?
For me the the most convenient interface would be if I didn't have to
mention the host byteorder. Just record what byteorder the buffer is in,
and convert when appropriate. Alan made a good point. It maybe be
convenient and/or necessary to specify a different byteorder in a mask.
I'm working on it; thinking in the following form:

bitwiser.mask{ field = { offset, length, sign, endian, step }}, where
sign is a boolean, endian is a string like 'host', 'h', 'big', 'b',
'little', 'l', 'net' or 'n', and step is a number between [1, 64].

And (to allow omitting or non-ordering of parameters):

bitwiser.mask{ field =
{ __offset = offset, __length = length, __sign = sign, __endian =
endian, __step = step }}

defaults are __sign = true, __endian = 'host' and __step = undef. If
step is present or length is omitted, the lib assumes that it is a
segment field, what means that it should return a bitwiser.buffer
userdatum if accessed, which can be accessed like an array (using
step, if it is defined, or else the step of the original buffer to
determine the length of each field). It also could be masked to use
field access. For example:

m = bitwiser.mask{
type = { 0, 4 },
flags = { 4, 4, __step = 1 },
payload = { 8 }
}

b = bitwiser.buffer{ 0xff, 0, 0xff, 0 } -- new buffers have step = 8, by default
b[1] --> 0xff

b:mask(m)

b.flags[1] = false --> unsets bit-4 (0-based)
b.flags[4] --> returns bit-7 (0-based), 1 in this case

b.payload:mask{ padding = { 0, 8 }, data = { 8, __step = 16 } }
b.payload.data[1] --> returns 2 bytes from bit-16 (0-based) of the
original buffer,
-- 0x00ff or 0xff00 depending on platform
endianess, in this case
Post by Christoph Badura
Post by Lourival Vieira Neto
buf:rawget(0, 9) ~> if net flag is *true*: takes 16 bits from
You know, I think rawget is badly named. "Raw" implies unmodified.
And byteswapping is a form of modification. Maybe you can find a better
name.
Yes, you're right. I changed that to :get(offset, length).
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
Post by Lourival Vieira Neto
buf:mask(3)
buf[3] ~> accesses 3 bits from bit-6 position
What does that mean? Does it return the top-most 2 bits from the first
byte plus the least significant bit fom the second byte of the buffer?
It means the least-most 2 bits from the first byte and the LSB from the second.
I don't know what "least-most" means. But since you don't seem to
agree with my questions I assume you intend that to be the opposite of
"top-most" then your statement doesn't make sense to me. Translated
into normal terms it would return bits 0 and 1 of the first byte and bit
0 of the second one. They are not even contigous.
Really sorry about that. It was a brain-segfault (or "dyslexia"). I
intended to say the 'bottom-least' 2 bits from the first byte
prepending the 'top-most' 1 bit from the second byte ([ b6 | b7 | b8
], 0-based). Anyway, I just aborted this kind of array access.
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
What is 'length' for?
Offset and length could be used to impose boundaries to the mask.
Don't tell it to me. Write it into the documentation. :-)
Sure. As soon as I have one =).
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
Post by Lourival Vieira Neto
buf:mask{ 2, 2, 32, 9 }
buf[2] ~> accesses 2 bits from bit-2 position
What exactly would "buf[3]" return. Please be explicit in whether you are
counting byte offsets or bit offsets. I can't figure that out from your
description.
It would return 32 bits (converted or not, depending on 'net' flag)
from bit-4 (MSB-ordered).
From "bit-4"? From your own use of 1-based indices don't you mean the
5th bit?
Yes, bit-4 0-based or 5th. I'm using 1-based for Lua array access and
0-based for bit offsets. Sorry, I should have been clearer about it.
Post by Christoph Badura
Post by Lourival Vieira Neto
Post by Christoph Badura
Personally, the idea of making array access to the buffer depend on
state stored in the buffer does not look appealing to me. It prevents
buffers to be passed around because consumers don't know what they will
get back on array access.
I think it could be useful to access nonaligned and aligned data
easily without caring about naming fields.
I guess I did not make myself understandable. Storing that state globally
means that every function you hand an lbuf too has to set the global
mask before it accesses the lbuf, because it can't know if the current
mask is compatible with its own use. And after call to a function, the
caller has to reset the mask, because it could have been changed.
function filter1(packet)
packet:mask(whatever)
...
filter2(packet)
packet:mask(whatever) -- restore our mask
...
filter3(packet)
packet:mask(whatever) -- restore our mask
...
end
function filter2(packet)
packet:mask(something_else)
...
filter4(packet)
packet:mask(something_else)
...
end
function filter3(packet)
packet:mask(entirely_different)
...
end
That, of course, is completely idiotic.
The newbuf = buf:mask() functionalty completely avoids that problem and is
sufficient. I would provide only that interface.
Sure, but it would avoid buffer userdatum reusing too. I mean, you
couldn't change the buffer mask once it is masked. Moreover, the first
buffer in this masking chain would never be effectively masked (the
mask would be applied only to the new one). It would be used only as
the raw data handler. I prefer to have another function (like,
segment([offset, lenght])) to return a new buffer userdatum pointing
to the same raw data. In this case you would have:

function filter1(packet)
packet:mask(whatever)
...
filter2(packet:segment()) -- passes a new userdatum containing the
entire original data
-- don't need to restore packet's mask here
...
-- more commonly, passes a segment containing payload only
filterN(packet:segment(payload_offset))
...
-- or, if you have defined a payload field in the packet's mask
filterN(packet.payload)
...
end

Regards
--
Lourival Vieira Neto
Alexander Nasonov
2013-10-15 21:52:35 UTC
Permalink
Post by Christoph Badura
Also, having to switch mentally between zero-based arrays in the kernel C
code and 1-based arrays in the Lua code make my head ache.
Yeah, I totally agree here. There are several other reasons why Lua will
not become same league player with C in the kernel. But for some
projects, the classical module (in C) and scripting (in Lua) separation
works extremely well. This includes complex configurations where you
need to orchestrate many calls to C code or some complex tasks like
generating code for bpf or now defunct npf opcode.

Alex
Alexander Nasonov
2013-10-15 22:22:37 UTC
Permalink
Post by Lourival Vieira Neto
I'm developing a library to handle buffers in Lua, named lbuf. It is
been developed as part of my efforts to perform experimentation in
kernel network stack using Lua. Initially, I intended to bind mbuf to
allow, for example, to write protocols dissectors in Lua. For example,
function filter(packet)
if packet.field == value then return DROP end
return PASS
end
Thus, I started to design a Lua binding to mbuf inspired by '#pragma
pack' and bitfields of C lang. Then, I realized that this Lua library
could be useful to other kernel (and user-space) areas, such as device
drivers and user-level protocols. So, I started to develop this
binding generically as a independent library to give random access to
bits in a buffer. It is just in the early beginning, but I want to
share some thoughts.
I wonder if you looked at Lua support in Wireshark [1]? Unfortunately,
it's GPL and they even have a special section 'Beware the GPL' on wiki.

[1] http://wiki.wireshark.org/Lua

Alex
Lourival Vieira Neto
2013-10-15 23:00:26 UTC
Permalink
Post by Alexander Nasonov
Post by Lourival Vieira Neto
I'm developing a library to handle buffers in Lua, named lbuf. It is
been developed as part of my efforts to perform experimentation in
kernel network stack using Lua. Initially, I intended to bind mbuf to
allow, for example, to write protocols dissectors in Lua. For example,
function filter(packet)
if packet.field == value then return DROP end
return PASS
end
Thus, I started to design a Lua binding to mbuf inspired by '#pragma
pack' and bitfields of C lang. Then, I realized that this Lua library
could be useful to other kernel (and user-space) areas, such as device
drivers and user-level protocols. So, I started to develop this
binding generically as a independent library to give random access to
bits in a buffer. It is just in the early beginning, but I want to
share some thoughts.
I wonder if you looked at Lua support in Wireshark [1]? Unfortunately,
it's GPL and they even have a special section 'Beware the GPL' on wiki.
[1] http://wiki.wireshark.org/Lua
Alex
Yes. In fact, I have already implemented a Wireshark dissector in Lua
for a proprietary protocol that I was designing, inspired in ERP, to
detect network loops. WS Lua dissectors also served as inspiration.
However, I just used the API; I never looked at the binding
implementation.

Wireshark Lua dissectors is a good example of what can be done with
Lua in that sense. But I'm looking for a more generic API that could
allow random bit access in a buffer using Lua table notation, that
could also be used to communicate with devices, for example. I think
(IMHO) that lbuf masks is more straight forward.

Regards,
--
Lourival Vieira Neto
Lourival Vieira Neto
2013-10-16 14:41:03 UTC
Permalink
Hi Justin,

On Tue, Oct 15, 2013 at 7:38 PM, Justin Cormack
On Thu, Oct 10, 2013 at 7:15 PM, Lourival Vieira Neto
Post by Lourival Vieira Neto
Hi folks,
It has been a long time since my GSoC project and though I have tried
to come back, I've experienced some personal issues. However, now I'm
coding again.
I'm developing a library to handle buffers in Lua, named lbuf. It is
been developed as part of my efforts to perform experimentation in
kernel network stack using Lua. Initially, I intended to bind mbuf to
allow, for example, to write protocols dissectors in Lua. For example,
function filter(packet)
if packet.field == value then return DROP end
return PASS
end
Thus, I started to design a Lua binding to mbuf inspired by '#pragma
pack' and bitfields of C lang. Then, I realized that this Lua library
could be useful to other kernel (and user-space) areas, such as device
drivers and user-level protocols. So, I started to develop this
binding generically as a independent library to give random access to
bits in a buffer. It is just in the early beginning, but I want to
share some thoughts.
I have been using the luajit ffi and luaffi, which let you directly
use C structs (with bitfields) in Lua to do this. It makes it easier
to reuse stuff that is already defined in C. (luaffi is not in its
current state portable but my plan is to strip out the non portable
bits, which are the function call support).
Justin
I never used luaffi. It sounds very interesting and I think it could
be very useful to bind already defined C structs, but my purpose is to
dynamically define data layouts using Lua syntax (without parsing C
code).

Regards,
--
Lourival Vieira Neto
Marc Balmer
2013-10-16 14:53:58 UTC
Permalink
Post by Lourival Vieira Neto
Hi Justin,
On Tue, Oct 15, 2013 at 7:38 PM, Justin Cormack
On Thu, Oct 10, 2013 at 7:15 PM, Lourival Vieira Neto
Post by Lourival Vieira Neto
Hi folks,
It has been a long time since my GSoC project and though I have tried
to come back, I've experienced some personal issues. However, now I'm
coding again.
I'm developing a library to handle buffers in Lua, named lbuf. It is
been developed as part of my efforts to perform experimentation in
kernel network stack using Lua. Initially, I intended to bind mbuf to
allow, for example, to write protocols dissectors in Lua. For example,
function filter(packet)
if packet.field == value then return DROP end
return PASS
end
Thus, I started to design a Lua binding to mbuf inspired by '#pragma
pack' and bitfields of C lang. Then, I realized that this Lua library
could be useful to other kernel (and user-space) areas, such as device
drivers and user-level protocols. So, I started to develop this
binding generically as a independent library to give random access to
bits in a buffer. It is just in the early beginning, but I want to
share some thoughts.
I have been using the luajit ffi and luaffi, which let you directly
use C structs (with bitfields) in Lua to do this. It makes it easier
to reuse stuff that is already defined in C. (luaffi is not in its
current state portable but my plan is to strip out the non portable
bits, which are the function call support).
Justin
I never used luaffi. It sounds very interesting and I think it could
be very useful to bind already defined C structs, but my purpose is to
dynamically define data layouts using Lua syntax (without parsing C
code).
FFI in the kernel can be dangerous. Pure Lua is a perfect confinment
for code, but with an FFI a Lua script can access almost anything in the
kernel. One has to think twice if one wants that.

Well, assuming it would be module, so I would not have to load it if I
don't want to.
Justin Cormack
2013-10-16 14:45:39 UTC
Permalink
Post by Lourival Vieira Neto
Hi Justin,
On Tue, Oct 15, 2013 at 7:38 PM, Justin Cormack
On Thu, Oct 10, 2013 at 7:15 PM, Lourival Vieira Neto
Post by Lourival Vieira Neto
Hi folks,
It has been a long time since my GSoC project and though I have tried
to come back, I've experienced some personal issues. However, now I'm
coding again.
I'm developing a library to handle buffers in Lua, named lbuf. It is
been developed as part of my efforts to perform experimentation in
kernel network stack using Lua. Initially, I intended to bind mbuf to
allow, for example, to write protocols dissectors in Lua. For example,
function filter(packet)
if packet.field == value then return DROP end
return PASS
end
Thus, I started to design a Lua binding to mbuf inspired by '#pragma
pack' and bitfields of C lang. Then, I realized that this Lua library
could be useful to other kernel (and user-space) areas, such as device
drivers and user-level protocols. So, I started to develop this
binding generically as a independent library to give random access to
bits in a buffer. It is just in the early beginning, but I want to
share some thoughts.
I have been using the luajit ffi and luaffi, which let you directly
use C structs (with bitfields) in Lua to do this. It makes it easier
to reuse stuff that is already defined in C. (luaffi is not in its
current state portable but my plan is to strip out the non portable
bits, which are the function call support).
Justin
I never used luaffi. It sounds very interesting and I think it could
be very useful to bind already defined C structs, but my purpose is to
dynamically define data layouts using Lua syntax (without parsing C
code).
Yes absolutely it makes more sense if already defined in C. For parsing
binary stuff I would look at Erlang for inspiration too, it is one of the
nicer designs.

Justin
Lourival Vieira Neto
2013-10-16 18:39:08 UTC
Permalink
On Wed, Oct 16, 2013 at 11:45 AM, Justin Cormack
Post by Justin Cormack
(...)
Yes absolutely it makes more sense if already defined in C. For parsing
binary stuff I would look at Erlang for inspiration too, it is one of the
nicer designs.
Justin
I never gone that far in Erlang. It looks really interesting [1]. I'll
take a deeper look later. Thanks!

Regards,
--
Lourival Vieira Neto
Loading...