From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7634 Path: news.gmane.org!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] First prototype of script which adds CFI directives to x86 asm Date: Wed, 13 May 2015 13:39:18 +0200 Message-ID: <20150513113918.GE31118@port70.net> References: <1431466124-2848-1-git-send-email-alexinbeijing@gmail.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1431517184 9185 80.91.229.3 (13 May 2015 11:39:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 13 May 2015 11:39:44 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7646-gllmg-musl=m.gmane.org@lists.openwall.com Wed May 13 13:39:41 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YsV0g-0004ct-9y for gllmg-musl@m.gmane.org; Wed, 13 May 2015 13:39:34 +0200 Original-Received: (qmail 18156 invoked by uid 550); 13 May 2015 11:39:32 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 18138 invoked from network); 13 May 2015 11:39:32 -0000 Mail-Followup-To: musl@lists.openwall.com Content-Disposition: inline In-Reply-To: <1431466124-2848-1-git-send-email-alexinbeijing@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Xref: news.gmane.org gmane.linux.lib.musl.general:7634 Archived-At: * Alex Dowad [2015-05-12 23:28:44 +0200]: > diff --git a/Makefile b/Makefile > index 6559295..f7335aa 100644 > --- a/Makefile > +++ b/Makefile > @@ -118,7 +118,11 @@ $(foreach s,$(wildcard src/*/$(ARCH)*/*.s),$(eval $(call mkasmdep,$(s)))) > $(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $(dir $<)$(shell cat $<) > > %.o: $(ARCH)/%.s > +ifeq ($(ARCH),i386) > + awk -f tools/add-cfi.awk.i386 $< | $(CC) $(CFLAGS_ALL_STATIC) -x assembler -c -o $@ - > +else > $(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $< > +endif > it might make sense to have a make rule that produces the cfi asm for inspection %.cfi.s: $(ARCH)/%.s $(ADD_CFI) $< > $@ %.o: %.cfi.s $(CC) $(CFLAGS_ALL_STATIC) -c -o $@ $< (unless ppl feel that adds too much clutter.. the clean rule should be adjusted too, but make sometimes removes the intermediates based on some mysterious logic) i wonder if a configure check for .cfi support should be added: in theory an assembler may not support it (tcc?) > +++ b/tools/add-cfi.awk.i386 > @@ -0,0 +1,152 @@ > +# Insert GAS CFI directives ("control frame information") into x86-32 asm input > +# > +# CFI directives tell the assembler how to generate "stack frame" debug info > +# This information can tell a debugger (like gdb) how to find the current stack > +# frame at any point in the program code, and how to find the values which > +# various registers had at higher points in the call stack > +# With this information, the debugger can show a backtrace, and you can move up > +# and down the call stack and examine the values of local variables > + > +BEGIN { > + # don"t put CFI data in the .eh_frame ELF section (which we don't keep) > + print ".cfi_sections .debug_frame" > + > + # only emit CFI directives inside a function callable from C > + # (blindly emitting a '.cfi_startproc' at the beginning of each file and > + # '.cfi_endproc' at the end doesn't work) > + in_function = 0 > +} > + > +function hex2int(str) { > + str = tolower(str) > + > + for (i = 1; i <= 16; i++) { > + char = substr("0123456789abcdef", i, 1) > + lookup[char] = i-1 > + } > + move this loop to BEGIN so it only runs at startup (in a more complex script you should mark local variables otherwise everything is in the global namespace and function calls clobber each other's temporaries.. here it is ok as there arent many nested calls, but i tend to add variables like "i" to the argument list) > + result = 0 > + for (i = 1; i <= length(str); i++) { > + result = result * 16 > + char = substr(str, i, 1) > + result = result + lookup[char] > + } > + return result > +} > + > +function get_const() { > + # only use if you already know there is 1 and only 1 constant > + match($0, /\$[0-9a-fA-F]+/) > + return hex2int(substr($0, RSTART+1, RLENGTH-1)) > +} i think hex conversion for $123 is wrong in i386 asm you should only hex convert $0x123 > +function get_reg() { > + # only use if you already know there is 1 and only 1 register > + match($0, /%e(ax|bx|cx|dx|si|di|bp)/) > + return substr($0, RSTART+1, RLENGTH-1) > +} > +function get_reg1() { > + # for instructions with 2 operands, get 1st operand (assuming it is register) > + match($0, /%e(ax|bx|cx|dx|si|di|bp),/) > + return substr($0, RSTART+1, RLENGTH-2) > +} > +function get_reg2() { > + # for instructions with 2 operands, get 2nd operand (assuming it is register) > + match($0, /,%e(ax|bx|cx|dx|si|di|bp)/) > + return substr($0, RSTART+2, RLENGTH-2) allow whitespace between ',' and the regs (or even better: have a global rule to canonicalize the code removing whitespace, comments etc, but be careful about constants eg. avoid changing .ascii "foo") http://sourceware.org/binutils/docs/as/Constants.html > +} > + > +function adjust_sp_offset(delta) { > + if (in_function) { > + printf ".cfi_adjust_cfa_offset %d\n", delta > + } > +} > + > +{ print } > + > +/\.type.*,@function/ { > + if (in_function) { > + print ".cfi_endproc" > + } > + (currently this will match inside comments) i noticed that crt/ asm does not have .type directives i wonder if that's intentional (missing .cfi_startproc/endproc might be problematic i think because .cfi directives can be rejected outside of startproc/endproc) note that there are functions with aliases where you have .global foo .global bar .type foo,@function .type bar,@function foo: bar: ... code so there will be an empty startproc/endproc there.. not sure if that causes any problems > + print ".cfi_startproc" > + in_function = 1 > + > + for (register in saved) > + delete saved[register] > + for (register in dirty) > + delete dirty[register] > +} > + > +# KEEPING UP WITH THE STACK POINTER > +# We do NOT attempt to understand foolish and ridiculous tricks like stashing > +# the stack pointer and then using %esp as a scratch register, or bitshifting > +# it or taking its square root or anything stupid like that. > +# %esp should only be adjusted by pushing/popping or adding/subtracting constants > +# > +/pushl?/ { adjust_sp_offset(4) } > +/popl?/ { adjust_sp_offset(-4) } i think it's possible to push 2 bytes (push %ax) > +# TODO: can add/sub instructions also specify offset in decimal? yes :) > +# TODO: can offset be negative? yes $-123 is a valid constant (it should be the same as $0xffffff85 in 32bit contexts) > +/addl?\s+\$[0-9a-fA-F]+,%esp/ { adjust_sp_offset(-get_const()) } > +/subl?\s+\$[0-9a-fA-F]+,%esp/ { adjust_sp_offset(get_const()) } > + > +# TRACKING REGISTER VALUES FROM THE PREVIOUS STACK FRAME > +# > +/pushl?\s+%e(ax|bx|cx|dx|si|di|bp)/ { # don't match "push (%reg)" > + # if a register is being pushed, and its value has not changed since the > + # beginning of this function, the pushed value can be used when printing > + # local variables at the next level up the stack > + # emit '.cfi_rel_offset' for that > + > + if (in_function) { > + register = get_reg() > + if (!saved[register] && !dirty[register]) { > + printf ".cfi_rel_offset %s,0\n", register > + saved[register] = 1 > + } > + } > +} > + > +# TODO: this should also understand hex offsets prefixed with 0x or -0x > +/movl?\s+%e(ax|bx|cx|dx|si|di|bp),-?[0-9]*\(%esp\)/ { yes, i think it can be hex > + if (in_function) { > + register = get_reg() > + if (match($0, /-?[0-9]+\(%esp\)/)) { > + offset = substr($0, RSTART, RLENGTH-6) # decimal, not hex! > + } else { > + offset = 0 > + } > + if (!saved[register] && !dirty[register]) { > + printf ".cfi_rel_offset %s,%d\n", register, offset > + saved[register] = 1 > + } > + } > +} > + > +# IF REGISTER VALUES ARE UNCEREMONIOUSLY TRASHED > +# ...then we want to know about it. > +# > +function trashed(register) { > + if (in_function && !saved[register] && !dirty[register]) { > + printf ".cfi_undefined %s\n", register > + } > + dirty[register] = 1 > +} > +# this does NOT exhaustively check for all possible instructions which could > +# overwrite a register value inherited from the caller (just the common ones) > +# TODO: detect when ax/ah/al/etc. are trashed -- means eax is no longer usable either > +/mov.*,%e(ax|bx|cx|dx|si|di|bp)/ { trashed(get_reg2()) } > +/(add|addl|sub|subl|and|or|xor|lea|sal|sar|shl|shr)\s+%e(ax|bx|cx|dx|si|di|bp),/ { > + trashed(get_reg1()) > +} > +/i?mul\s+[^,]*$/ { trashed("eax"); trashed("edx") } > +/i?mul\s+%e(ax|bx|cx|dx|si|di|bp),/ { trashed(get_reg1()) } > +/^(\w+:)?\s*i?div/ { trashed("eax"); trashed("edx") } > +/(dec|inc|not|neg|pop)\s+%e(ax|bx|cx|dx|si|di|bp)/ { trashed(get_reg()) } > +/^(\w+:)\s*cpuid/ { trashed("eax"); trashed("ebx"); trashed("ecx"); trashed("edx") } > + > +END { > + if (in_function) { > + print ".cfi_endproc" > + } > +} > \ No newline at end of file > -- > 2.0.0.GIT