mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH v3] Build process uses script to add CFI directives to x86 asm
@ 2015-05-13 17:54 Alex Dowad
  2015-05-13 19:22 ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Dowad @ 2015-05-13 17:54 UTC (permalink / raw)
  To: musl

Some functions implemented in asm need to use EBP for purposes other than acting
as a frame pointer. (Notably, it is used for the 6th argument to syscalls with
6 arguments.) Without frame pointers, GDB can only show backtraces if it gets
CFI information from a .debug_frame or .eh_frame ELF section.

Rather than littering our asm with ugly .cfi directives, use an awk script to
insert them in the right places during the build process, so GDB can keep track of
where the current stack frame is relative to the stack pointer. This means GDB can
produce beautiful stack traces at any given point when single-stepping through asm
functions.

Additionally, when registers are saved on the stack and later overwritten, emit
.cfi directives so GDB will know where they were saved relative to the stack
pointer. This way, when you look back up the stack from within an asm function,
you can still reliably print the values of local variables in the caller.

If this awk script were to understand every possible wild and crazy contortion that
an asm programmer can do with the stack and registers, and always emit the exact
.cfi directives needed for GDB to know what the register values were in the
preceding stack frame, it would necessarily be as complex as a full x86 emulator.
That way lies madness.

Hence, we assume that the stack pointer will _only_ ever be adjusted using push/pop
or else add/sub with a constant. We do not attempt to detect every possible way that
a register value could be saved for later use.
---

Dear muslers,

The AWK script here has been tweaked up and made more robust in response to
suggestions from Szabolcs Nagy.

I've noticed that using tempfiles for the augmented asm has a drawback:
In the source file/line debugging info generated by the assembler, it records
the source file as "/tmp/<random-garbage>". Then, when you try to debug a program
which is linked against the resulting musl, GDB tries to open "/tmp/<random-garbage>"
to show in the source window.

Suggestions?? Perhaps generate .cfi.s files as Szabolcs suggested??

Thanks,
Alex Dowad


 Makefile               |   2 +-
 tools/add-cfi.awk.i386 | 176 +++++++++++++++++++++++++++++++++++++++++++++++++
 tools/aswrap.sh        |  15 +++++
 3 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 tools/add-cfi.awk.i386
 create mode 100755 tools/aswrap.sh

diff --git a/Makefile b/Makefile
index 6559295..9aefd62 100644
--- a/Makefile
+++ b/Makefile
@@ -118,7 +118,7 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s))))
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $(dir $<)$(shell cat $<)
 
 %.o: $(ARCH)/%.s
-	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
+	tools/aswrap.sh $< $@ $(ARCH) "$(CC) $(CFLAGS_ALL_STATIC)"
 
 %.o: %.c $(GENH) $(IMPH)
 	$(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $<
diff --git a/tools/add-cfi.awk.i386 b/tools/add-cfi.awk.i386
new file mode 100644
index 0000000..116fba8
--- /dev/null
+++ b/tools/add-cfi.awk.i386
@@ -0,0 +1,176 @@
+# Insert GAS CFI directives ("control frame information") into x86-32 asm input
+#
+# CFI directives tell the assembler how to generate "stack frame" debug info
+# This information can tell a debugger (like gdb) how to find the current stack
+#   frame at any point in the program code, and how to find the values which
+#   various registers had at higher points in the call stack
+# With this information, the debugger can show a backtrace, and you can move up
+#   and down the call stack and examine the values of local variables
+
+BEGIN {
+  # don't put CFI data in the .eh_frame ELF section (which we don't keep)
+  print ".cfi_sections .debug_frame"
+
+  # only emit CFI directives inside a function
+  in_function = 0
+}
+
+function hex2int(str,   i) {
+  str = tolower(str)
+
+  for (i = 1; i <= 16; i++) {
+    char = substr("0123456789abcdef", i, 1)
+    lookup[char] = i-1
+  }
+
+  result = 0
+  for (i = 1; i <= length(str); i++) {
+    result = result * 16
+    char   = substr(str, i, 1)
+    result = result + lookup[char]
+  }
+  return result
+}
+
+function get_const1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is constant)
+  match($0, /-?(0x[0-9a-fA-F]+|[0-9]+),/)
+  return parse_const(substr($0, RSTART, RLENGTH-1))
+}
+function parse_const(const) {
+  if (substr(const, 1, 1) == "-") {
+    if (substr(const, 2, 2) == "0x") {
+      return -hex2int(substr(const, 4, length(const)-3))
+    } else {
+      return const
+    }
+  } else {
+    if (substr(const, 1, 2) == "0x") {
+      return hex2int(substr(const, 3, length(const)-2))
+    } else {
+      return const
+    }
+  }
+}
+function get_reg() {
+  # only use if you already know there is 1 and only 1 register
+  match($0, /%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg1() {
+  # for instructions with 2 operands, get 1st operand (assuming it is register)
+  match($0, /%e(ax|bx|cx|dx|si|di|bp)\s*,/)
+  return substr($0, RSTART+1, 3)
+}
+function get_reg2() {
+  # for instructions with 2 operands, get 2nd operand (assuming it is register)
+  match($0, /,\s*%e(ax|bx|cx|dx|si|di|bp)/)
+  return substr($0, RSTART+RLENGTH-3, 3)
+}
+
+function adjust_sp_offset(delta) {
+  if (in_function) {
+    printf ".cfi_adjust_cfa_offset %d\n", delta
+  }
+}
+
+{ print }
+
+/^.global\s+\w+/ {
+  globals[$2] = 1
+}
+/^\w+:/ {
+  label = substr($1, 1, length($1)-1) # drop trailing :
+
+  if (globals[label]) {
+    if (in_function)
+      print ".cfi_endproc"
+
+    in_function = 1
+    print ".cfi_startproc"
+
+    for (register in saved)
+      delete saved[register]
+    for (register in dirty)
+      delete dirty[register]
+  }
+}
+
+# KEEPING UP WITH THE STACK POINTER
+# We do NOT attempt to understand foolish and ridiculous tricks like stashing
+#   the stack pointer and then using %esp as a scratch register, or bitshifting
+#   it or taking its square root or anything stupid like that.
+# %esp should only be adjusted by pushing/popping or adding/subtracting constants
+#
+/pushl?/ {
+  if (match($0, /\s+%(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(2)
+  else
+    adjust_sp_offset(4)
+}
+/popl?/ {
+  if (match($0, /\s+%(ax|bx|cx|dx|di|si|bp|sp)/))
+    adjust_sp_offset(-2)
+  else
+    adjust_sp_offset(-4)
+}
+/addl?\s+\$-?(0x[0-9a-fA-F]+|[0-9]+),\s*%esp/ { adjust_sp_offset(-get_const1()) }
+/subl?\s+\$-?(0x[0-9a-fA-F]+|[0-9]+),\s*%esp/ { adjust_sp_offset(get_const1()) }
+
+# TRACKING REGISTER VALUES FROM THE PREVIOUS STACK FRAME
+#
+/pushl?\s+%e(ax|bx|cx|dx|si|di|bp)/ { # don't match "push (%reg)"
+  # if a register is being pushed, and its value has not changed since the
+  #   beginning of this function, the pushed value can be used when printing
+  #   local variables at the next level up the stack
+  # emit '.cfi_rel_offset' for that
+
+  if (in_function) {
+    register = get_reg()
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,0\n", register
+      saved[register] = 1
+    }
+  }
+}
+
+/movl?\s+%e(ax|bx|cx|dx|si|di|bp),\s*-?(0x[0-9a-fA-F]+|[0-9]+)?\(%esp\)/ {
+  if (in_function) {
+    register = get_reg()
+    if (match($0, /-?(0x[0-9a-fA-F]+|[0-9]+)\(%esp\)/)) {
+      offset = parse_const(substr($0, RSTART, RLENGTH-6))
+    } else {
+      offset = 0
+    }
+    if (!saved[register] && !dirty[register]) {
+      printf ".cfi_rel_offset %s,%d\n", register, offset
+      saved[register] = 1
+    }
+  }
+}
+
+# IF REGISTER VALUES ARE UNCEREMONIOUSLY TRASHED
+# ...then we want to know about it.
+#
+function trashed(register) {
+  if (in_function && !saved[register] && !dirty[register]) {
+    printf ".cfi_undefined %s\n", register
+  }
+  dirty[register] = 1
+}
+# this does NOT exhaustively check for all possible instructions which could
+# overwrite a register value inherited from the caller (just the common ones)
+/mov.*,%e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg2()) }
+/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr)\s+%e(ax|bx|cx|dx|si|di|bp),/ {
+  trashed(get_reg1())
+}
+/i?mul\s+[^,]*$/                    { trashed("eax"); trashed("edx") }
+/i?mul\s+%e(ax|bx|cx|dx|si|di|bp),/ { trashed(get_reg1()) }
+/^(\w+:)?\s*i?div/                  { trashed("eax"); trashed("edx") }
+/(dec|inc|not|neg|pop)\s+%e(ax|bx|cx|dx|si|di|bp)/  { trashed(get_reg()) }
+/^(\w+:)\s*cpuid/ { trashed("eax"); trashed("ebx"); trashed("ecx"); trashed("edx") }
+
+END {
+  if (in_function)
+    print ".cfi_endproc"
+}
\ No newline at end of file
diff --git a/tools/aswrap.sh b/tools/aswrap.sh
new file mode 100755
index 0000000..0afbd4e
--- /dev/null
+++ b/tools/aswrap.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+# Run assembler to produce an object file, optionally applying other pre-processing steps
+input=$1
+output=$2
+arch=$3
+as=$4
+
+if [ -f "tools/add-cfi.awk.$arch" ]; then
+  tmpfile=$(mktemp -t musl-aswrap-XXXXXX)
+  awk -f tools/add-cfi.awk.$arch $input >$tmpfile
+  mv $tmpfile $tmpfile.s
+  input=$tmpfile.s
+fi
+
+$as -c -o $output $input
\ No newline at end of file
-- 
2.0.0.GIT



^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH v3] Build process uses script to add CFI directives to x86 asm
@ 2015-05-15 17:31 Alex Dowad
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Dowad @ 2015-05-15 17:31 UTC (permalink / raw)
  To: musl

Dear Szabolcs Nagy (and other interested parties),

> you can use
> 
>  .file "foo.s"

Thanks for the idea! Unfortunately, implementing it has proved very troublesome.

The GAS documentation does refer to ".file <source file>". There's just a tiny little
problem with it -- it doesn't actually work for the desired purpose. It does
register <source file> as a dependency, so it will be included in the dependency
file which is written out if you invoke GAS with the --MD option. But that's about it.

If you look at asm generated by GCC, it includes a ".file <source file>" line at the
top, but it *also* includes a ".file <number> <source file>" line. Which actually
sets the "source file" in the debugging info! (Yay!) Subsequent ".loc" directives
use the source file number when identifying source lines.

So we just use ".file <number> <source file>" and everybody is happy, right? Yes?
Good? Right?

Wrong.

Allow me to quote gas/dwarf2dbg.c:596-598 from the binutils repo:

  /* A .file directive implies compiler generated debug information is
     being supplied.  Turn off gas generated debug info.  */
  debug_type = DEBUG_NONE;

Snap.

Normally, GAS automatically generates debug info on source line numbers, and a few
other basic things. As soon as you use a ".file <number> <source file>" directive,
all that automatic debugging output is shut off, and you have to use explicit
assembler directives for *everything*.

I guess this makes sense, because if the "source file" is a completely
different file from the input asm file, the automatically generated line number
info will be completely wrong.

What a pain! Well, I guess I'll just have to use my own, explicit ".loc" directives.

> i think passing down the build command that way is not ok

It does seem like a hack -- but I'm not sure what a better way to do it is.

(I'm not a "real" shell programmer, if you hadn't noticed yet. I just fake it using
some combination of Stack Overflow and manpages.)

> i think
>
> pushl $123
> push $123
>
> are different

'push %eax' and 'pushl %eax' assemble to exactly the same machine code. Likewise,
'push $1' and 'pushl $1' assemble just the same.

Interestingly, 'pushl %ax' assembles to 'push %eax', 'push %ax' is just 'push %ax'.

> set LC_ALL=C because you depend on collation order
> in the awk script

Please see if I did this right in the v4.

> add new lines at the end

Done.

Thanks for other enhancements to the awk script (I will credit you in the commit log
message).

Kind regards, AD


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-15 17:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13 17:54 [PATCH v3] Build process uses script to add CFI directives to x86 asm Alex Dowad
2015-05-13 19:22 ` Szabolcs Nagy
2015-05-14  2:57   ` Rich Felker
2015-05-14 10:25     ` Szabolcs Nagy
2015-05-15 17:31 Alex Dowad

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).