I really like rsc's libtask and have managed to hide it in a few products.
As for your question: What architecture? Any runtime available?
Personally, I've used libtask on ARM/x86 under Linux/OSX... hardly "bare metal" though.
The current implementation depends mostly on the ucontext API + berkeley sockets for net stuff.