mailing list of musl libc
 help / color / mirror / code / Atom feed
* LUA + musl, garbage collection issue?
@ 2014-09-21  2:41 Scott Valentine
  2014-09-21  4:38 ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Scott Valentine @ 2014-09-21  2:41 UTC (permalink / raw)
  To: lua-l, musl

I have been testing OpenWRT builds using musl as the libc on arm and mips targets, and I have run into an issue which appears to be related to lua.
Essentially, when using the luci web interface to upload a file to the device, I am running into a memory exhaustion problem. A 6 MB upload results in nearly 70MB of memory allocation.
On smaller targets (i.e. 64MB RAM), it pretty much just locks up the system until the watchdog timer expires and reboots.
On bigger targets with enough RAM, the process completes and the memory is freed, but I think this is because the lua process exits.

On eglibc and uclibc builds I only see marginal memory increases during the upload, mostly do to the file getting written to tmpfs.

I'm having trouble tracking down where the issue is, but it looks like lua exclusively uses realloc for managing memory. Is this correct?

I noticed that in order to free memory, it basically calls realloc with 0 as the new size. Is this something musl doesn't handle well?

I'm trying a rebuild with a check for n == 0 in musl's realloc function to just free the pointer, and I'll report back.

What is "the right thing to do" to fix this? Should lua not be using realloc to free memory, or should musl handle the case better, if, in fact this is the problem?

Regards,
-Scott V.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LUA + musl, garbage collection issue?
  2014-09-21  2:41 LUA + musl, garbage collection issue? Scott Valentine
@ 2014-09-21  4:38 ` Rich Felker
  2014-09-21  9:58   ` Scott Valentine
  2014-09-21 10:16   ` Justin Cormack
  0 siblings, 2 replies; 7+ messages in thread
From: Rich Felker @ 2014-09-21  4:38 UTC (permalink / raw)
  To: musl

On Sat, Sep 20, 2014 at 04:41:14PM -1000, Scott Valentine wrote:
> I noticed that in order to free memory, it basically calls realloc
> with 0 as the new size. Is this something musl doesn't handle well?
> 
> I'm trying a rebuild with a check for n == 0 in musl's realloc
> function to just free the pointer, and I'll report back.
> 
> What is "the right thing to do" to fix this? Should lua not be using
> realloc to free memory, or should musl handle the case better, if,
> in fact this is the problem?

This is a bug in lua; it's depending on a bug in glibc. POSIX attempts
to allow the glibc behavior (see the text here:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/realloc.html)
but this allowance is invalid since it conflicts with the requirements
of ISO C (and, as you can see in the above link, "The functionality
described on this reference page is aligned with the ISO C standard.
Any conflict between the requirements described here and the ISO C
standard is unintentional. This volume of POSIX.1-2008 defers to the
ISO C standard.") The relevant ISO C requirement is:

"The realloc function returns a pointer to the new object (which may
have the same value as a pointer to the old object), or a null pointer
if the new object could not be allocated."

In particular, the only way realloc is permitted to return 0 is if the
operation failed, in which case the old pointer is still valid (not
freed).

But even if the glibc behavior weren't conflicting with the C
standard, it's still not valid for an application to assume it, and
it's still undesirable behavior (because it's inconsistent with
glibc's malloc, which returns a non-null, unique pointer for each
malloc(0), and because it makes it impossible for applications to
reliably determine whether the operation succeeded or failed).

The whole situation is a big enough mess that I feel I can safely say
that any application that ever passes 0 as the size argument to
realloc has potentially-serious bugs. So if lua wants to free memory,
it just needs to call free.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: LUA + musl, garbage collection issue?
  2014-09-21  4:38 ` Rich Felker
@ 2014-09-21  9:58   ` Scott Valentine
  2014-09-21 10:16   ` Justin Cormack
  1 sibling, 0 replies; 7+ messages in thread
From: Scott Valentine @ 2014-09-21  9:58 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3342 bytes --]



> Date: Sun, 21 Sep 2014 00:38:31 -0400
> From: dalias@libc.org
> To: musl@lists.openwall.com
> Subject: Re: [musl] LUA + musl, garbage collection issue?
> 
> On Sat, Sep 20, 2014 at 04:41:14PM -1000, Scott Valentine wrote:
> > I noticed that in order to free memory, it basically calls realloc
> > with 0 as the new size. Is this something musl doesn't handle well?
> > 
> > I'm trying a rebuild with a check for n == 0 in musl's realloc
> > function to just free the pointer, and I'll report back.
> > 
> > What is "the right thing to do" to fix this? Should lua not be using
> > realloc to free memory, or should musl handle the case better, if,
> > in fact this is the problem?
> 
> This is a bug in lua; it's depending on a bug in glibc. POSIX attempts
> to allow the glibc behavior (see the text here:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/realloc.html)
> but this allowance is invalid since it conflicts with the requirements
> of ISO C (and, as you can see in the above link, "The functionality
> described on this reference page is aligned with the ISO C standard.
> Any conflict between the requirements described here and the ISO C
> standard is unintentional. This volume of POSIX.1-2008 defers to the
> ISO C standard.") The relevant ISO C requirement is:
> 
> "The realloc function returns a pointer to the new object (which may
> have the same value as a pointer to the old object), or a null pointer
> if the new object could not be allocated."
> 
> In particular, the only way realloc is permitted to return 0 is if the
> operation failed, in which case the old pointer is still valid (not
> freed).
> 
> But even if the glibc behavior weren't conflicting with the C
> standard, it's still not valid for an application to assume it, and
> it's still undesirable behavior (because it's inconsistent with
> glibc's malloc, which returns a non-null, unique pointer for each
> malloc(0), and because it makes it impossible for applications to
> reliably determine whether the operation succeeded or failed).
> 
> The whole situation is a big enough mess that I feel I can safely say
> that any application that ever passes 0 as the size argument to
> realloc has potentially-serious bugs. So if lua wants to free memory,
> it just needs to call free.
> 
> Rich
Interestingly, I added the explicit free and return 0 at the top of musl's realloc function, but I am still seeing the same behavior, so now I am scratching my head. I've eliminated most pieces of the stack as sources of the problem. I can use scp to copy the file without issues, uhttpd pretty much never mallocs anything, I turned off ssl, and luci is almost entire written in lua. It does use the nixio library to actually write the file, but that doesn't really malloc anything and is really just a lightweight lua to native wrapper.
I'm going to try valgrind on the mips target to see who is causing trouble.
Could this possibly have something to do with the read/write stdio functions in musl? I can't really see how, but at present that is the only other implementation specific feature I can find that is relevant. Well, that and musl uses BUFSIZ = 1024 and glibc uses BUFSIZ = 8192. LUA references this variable directly, but it is really only a default.
Thanks,-Scott V. 		 	   		  

[-- Attachment #2: Type: text/html, Size: 3941 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LUA + musl, garbage collection issue?
  2014-09-21  4:38 ` Rich Felker
  2014-09-21  9:58   ` Scott Valentine
@ 2014-09-21 10:16   ` Justin Cormack
  2014-09-24  5:25     ` Scott Valentine
  1 sibling, 1 reply; 7+ messages in thread
From: Justin Cormack @ 2014-09-21 10:16 UTC (permalink / raw)
  To: musl

On Sun, Sep 21, 2014 at 5:38 AM, Rich Felker <dalias@libc.org> wrote:
> On Sat, Sep 20, 2014 at 04:41:14PM -1000, Scott Valentine wrote:
>> I noticed that in order to free memory, it basically calls realloc
>> with 0 as the new size. Is this something musl doesn't handle well?
>>
>> I'm trying a rebuild with a check for n == 0 in musl's realloc
>> function to just free the pointer, and I'll report back.
>>
>> What is "the right thing to do" to fix this? Should lua not be using
>> realloc to free memory, or should musl handle the case better, if,
>> in fact this is the problem?
>
> This is a bug in lua; it's depending on a bug in glibc. POSIX attempts

As pointed out on lua-l but not copied here, Lua is not doing this, it
does call free() in the 0 case, so something else is the issue...

(openwrt does use a patched Lua, might be worth testing with upstream).

Justin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LUA + musl, garbage collection issue?
  2014-09-21 10:16   ` Justin Cormack
@ 2014-09-24  5:25     ` Scott Valentine
  2014-09-24  5:50       ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Scott Valentine @ 2014-09-24  5:25 UTC (permalink / raw)
  To: musl; +Cc: Justin Cormack

On Sunday, September 21, 2014 11:16:13 AM Justin Cormack wrote:
> On Sun, Sep 21, 2014 at 5:38 AM, Rich Felker <dalias@libc.org> wrote:
> > On Sat, Sep 20, 2014 at 04:41:14PM -1000, Scott Valentine wrote:
> >> I noticed that in order to free memory, it basically calls realloc
> >> with 0 as the new size. Is this something musl doesn't handle well?
> >>
> >> I'm trying a rebuild with a check for n == 0 in musl's realloc
> >> function to just free the pointer, and I'll report back.
> >>
> >> What is "the right thing to do" to fix this? Should lua not be using
> >> realloc to free memory, or should musl handle the case better, if,
> >> in fact this is the problem?
> >
> > This is a bug in lua; it's depending on a bug in glibc. POSIX attempts
> 
> As pointed out on lua-l but not copied here, Lua is not doing this, it
> does call free() in the 0 case, so something else is the issue...
> 
> (openwrt does use a patched Lua, might be worth testing with upstream).
> 
> Justin

I see now that it does indeed call free. I was looking at frealloc in the lua source.

In any case, this has been a nasty issue to track down. I have surely traced it to the following code block in luci (by process of elimination):

        local fp
        luci.http.setfilehandler(
                function(meta, chunk, eof)
                        if not fp then
                                if meta and meta.name == "image" then
                                        fp = io.open(image_tmp, "w")
                                else
                                        fp = io.popen(restore_tmp, "w")
                                end
                        end
                        if chunk then
                                fp:write(chunk)
                        end
                        if eof then
                                fp:close()
                        end
                end
        )


Here, "chunk" is a 2048 byte string, and the library calls are to nixio:


static int nixio_file_write(lua_State *L) {
        int fd = nixio__checkfd(L, 1);
        size_t len;
        ssize_t sent;
        const char *data = luaL_checklstring(L, 2, &len);

        if (lua_gettop(L) > 2) {
                int offset = luaL_optint(L, 3, 0);
                if (offset) {
                        if (offset < len) {
                                data += offset;
                                len -= offset;
                        } else {
                                len = 0;
                        }
                }

                unsigned int wlen = luaL_optint(L, 4, len);
                if (wlen < len) {
                        len = wlen;
                }
        }

        do {
                sent = write(fd, data, len);
        } while(sent == -1 && errno == EINTR);
        if (sent >= 0) {
                lua_pushinteger(L, sent);
                return 1;
        } else {
                return nixio__perror(L);
        }
}

So that all looks pretty innocuous... 

Right now, my current work around that actually fixes the issue is to #define BUFSIZ 8192 in musl's stdio.h file.

I have tried increasing LUAL_BUFFERSIZE in luaconf.h (it defaults to BUFSIZ) and reducing the size of "chunk" to as small as 512 bytes, but neither approach fixes the problem.
Also, I don't think LUAL_BUFFERSIZE is relevant to the problem. Any comment?

Either way, it is still unclear as to why the problem occurs, and I can only conjecture because I have no good debugging tools for a musl build.

But basically, I am thinking it partly has to do with the stdio write implementation in musl (i.e. commiting write as soon as the buffer is full).

I'm still not sure where to look to find a proper fix though, I mean essentially it is writing the file to RAM anyway (tmpfs), so the buffer commit should be pretty fast.
Ideas?

When I have time, I'll do a build against eglibc with a reduced BUFSIZ = 1024 (musl's default) and see if the problem is reproduced.

-Scott V.

-- 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LUA + musl, garbage collection issue?
  2014-09-24  5:25     ` Scott Valentine
@ 2014-09-24  5:50       ` Szabolcs Nagy
  2014-09-24  6:14         ` Scott Valentine
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2014-09-24  5:50 UTC (permalink / raw)
  To: musl; +Cc: Justin Cormack

* Scott Valentine <scottvalen@hotmail.com> [2014-09-23 19:25:47 -1000]:
> 
> In any case, this has been a nasty issue to track down. I have surely traced it to the following code block in luci (by process of elimination):
> 
>         local fp
>         luci.http.setfilehandler(
>                 function(meta, chunk, eof)
>                         if not fp then
>                                 if meta and meta.name == "image" then
>                                         fp = io.open(image_tmp, "w")
>                                 else
>                                         fp = io.popen(restore_tmp, "w")
>                                 end
>                         end
>                         if chunk then
>                                 fp:write(chunk)
>                         end
>                         if eof then
>                                 fp:close()
>                         end
>                 end
>         )
> 
> 
> Here, "chunk" is a 2048 byte string, and the library calls are to nixio:
> 

are you sure the nixio function is called?

if fp is not set then io.open is called which should
use libc fopen, so fp:write should be a wrapper around
fwrite

if the code really calls the nixio function below then
there is no stdio involved, it directly writes to an fd

> static int nixio_file_write(lua_State *L) {
>         int fd = nixio__checkfd(L, 1);
>         size_t len;
>         ssize_t sent;
>         const char *data = luaL_checklstring(L, 2, &len);
> 
>         if (lua_gettop(L) > 2) {
>                 int offset = luaL_optint(L, 3, 0);
>                 if (offset) {
>                         if (offset < len) {
>                                 data += offset;
>                                 len -= offset;
>                         } else {
>                                 len = 0;
>                         }
>                 }
> 
>                 unsigned int wlen = luaL_optint(L, 4, len);
>                 if (wlen < len) {
>                         len = wlen;
>                 }
>         }
> 
>         do {
>                 sent = write(fd, data, len);
>         } while(sent == -1 && errno == EINTR);
>         if (sent >= 0) {
>                 lua_pushinteger(L, sent);
>                 return 1;
>         } else {
>                 return nixio__perror(L);
>         }
> }
> 


> When I have time, I'll do a build against eglibc with a reduced BUFSIZ = 1024 (musl's default) and see if the problem is reproduced.
> 

i think you should try to reproduce the bug with a minimal
test-case where either stdio (io.*) is called or nixio only


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: LUA + musl, garbage collection issue?
  2014-09-24  5:50       ` Szabolcs Nagy
@ 2014-09-24  6:14         ` Scott Valentine
  0 siblings, 0 replies; 7+ messages in thread
From: Scott Valentine @ 2014-09-24  6:14 UTC (permalink / raw)
  To: musl; +Cc: Szabolcs Nagy, Justin Cormack

On Wednesday, September 24, 2014 07:50:01 AM Szabolcs Nagy wrote:
> * Scott Valentine <scottvalen@hotmail.com> [2014-09-23 19:25:47 -1000]:
> > 
> > In any case, this has been a nasty issue to track down. I have surely traced it to the following code block in luci (by process of elimination):
> > 
> >         local fp
> >         luci.http.setfilehandler(
> >                 function(meta, chunk, eof)
> >                         if not fp then
> >                                 if meta and meta.name == "image" then
> >                                         fp = io.open(image_tmp, "w")
> >                                 else
> >                                         fp = io.popen(restore_tmp, "w")
> >                                 end
> >                         end
> >                         if chunk then
> >                                 fp:write(chunk)
> >                         end
> >                         if eof then
> >                                 fp:close()
> >                         end
> >                 end
> >         )
> > 
> > 
> > Here, "chunk" is a 2048 byte string, and the library calls are to nixio:
> > 
> 
> are you sure the nixio function is called?
> 
> if fp is not set then io.open is called which should
> use libc fopen, so fp:write should be a wrapper around
> fwrite

> if the code really calls the nixio function below then
> there is no stdio involved, it directly writes to an fd

You are correct... My brain must be getting tired. 
 
> > static int nixio_file_write(lua_State *L) {
> >         int fd = nixio__checkfd(L, 1);
> >         size_t len;
> >         ssize_t sent;
> >         const char *data = luaL_checklstring(L, 2, &len);
> > 
> >         if (lua_gettop(L) > 2) {
> >                 int offset = luaL_optint(L, 3, 0);
> >                 if (offset) {
> >                         if (offset < len) {
> >                                 data += offset;
> >                                 len -= offset;
> >                         } else {
> >                                 len = 0;
> >                         }
> >                 }
> > 
> >                 unsigned int wlen = luaL_optint(L, 4, len);
> >                 if (wlen < len) {
> >                         len = wlen;
> >                 }
> >         }
> > 
> >         do {
> >                 sent = write(fd, data, len);
> >         } while(sent == -1 && errno == EINTR);
> >         if (sent >= 0) {
> >                 lua_pushinteger(L, sent);
> >                 return 1;
> >         } else {
> >                 return nixio__perror(L);
> >         }
> > }
> > 
> 
> 
> > When I have time, I'll do a build against eglibc with a reduced BUFSIZ = 1024 (musl's default) and see if the problem is reproduced.
> > 
> 
> i think you should try to reproduce the bug with a minimal
> test-case where either stdio (io.*) is called or nixio only

Yes, now that you've pointed out the above, I have something I can work with. I should be able to write a pure lua script to try and reproduce the problem.

-Scott V.

-- 



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-24  6:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-21  2:41 LUA + musl, garbage collection issue? Scott Valentine
2014-09-21  4:38 ` Rich Felker
2014-09-21  9:58   ` Scott Valentine
2014-09-21 10:16   ` Justin Cormack
2014-09-24  5:25     ` Scott Valentine
2014-09-24  5:50       ` Szabolcs Nagy
2014-09-24  6:14         ` Scott Valentine

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).