clang-p2996

Author	SHA1	Message	Date
Ayke van Laethem	9592920890	[AVR] Optimize 32-bit shifts: optimize REG_SEQUENCE This pseudo-instruction stores two small (8-bit) registers into one wide (16-bit) register. But apparently the order matters a lot to the register allocator. This patch changes the order of inserting the registers to optimize for the best register allocation in the tests of shift32.ll. It might be detrimental in other cases, but keeping the registers in the same physical register seems like it would be a common case. Differential Revision: https://reviews.llvm.org/D140573	2023-01-08 20:05:31 +01:00
Ayke van Laethem	fad5e0cf50	[AVR] Optimize 32-bit shifts: reverse shift + move This optimization turns shifts of almost a multiple of 8 into a shift into the opposite direction. Unfortunately it doesn't compose well with the other optimizations (I've tried) so it's separate from them. Differential Revision: https://reviews.llvm.org/D140572	2023-01-08 20:05:31 +01:00
Ayke van Laethem	81f5f22f27	[AVR] Optimize 32-bit shifts: shift by 4 bits This uses a complicated shift sequence that avr-gcc also uses, but extended to work over any number of bytes and in both directions (logical shift left and logical shift right). Unfortunately it can't be used for an arithmetic shift right: I've tried to come up with a sequence but couldn't. Differential Revision: https://reviews.llvm.org/D140571	2023-01-08 20:05:31 +01:00
Ayke van Laethem	8f8afabd32	[AVR] Optimize 32-bit shift: move bytes around This patch optimizes 32-bit constant shifts by renaming registers. This is very effective as the compiler would otherwise need to do a lot of single bit shift instructions. Instead, the registers are renamed at the SSA level which means the register allocator will insert the necessary mov instructions. Unfortunately, the register allocator will insert some unnecessary movs with the current code. This will be fixed in a later patch. Differential Revision: https://reviews.llvm.org/D140570	2023-01-08 20:05:31 +01:00
Ayke van Laethem	840d10a1d2	[AVR] Custom lower 32-bit shift instructions 32-bit shift instructions were previously expanded using the default SelectionDAG expander, which meant it used 16-bit constant shifts and ORed them together. This works, but is far from optimal. I've optimized 32-bit shifts on AVR using a custom inserter. This is done using three new pseudo-instructions that take the upper and lower bits of the value in two separate 16-bit registers and outputs two 16-bit registers. This is the first commit in a series. When completed, shift instructions will take around 31% less instructions on average for constant 32-bit shifts, and is in all cases equal or better than the old behavior. It also tends to match or outperform avr-gcc: the only cases where avr-gcc does better is when it uses a loop to shift, or when the LLVM register allocator inserts some unnecessary movs. But it even outperforms avr-gcc in some cases where avr-gcc does not use a loop. As a side effect, non-constant 32-bit shifts also become more efficient. For some real-world differences: the build of compiler-rt I use in TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9% smaller. I think picolibc is a better representation of real-world code, but even a ~1% reduction in code size is really significant. The current patch just lays the groundwork. The result is actually a regression in code size. Later patches will use this as a basis to optimize these shift instructions. Differential Revision: https://reviews.llvm.org/D140569	2023-01-08 20:05:31 +01:00
Ayke van Laethem	0408b131eb	[SelectionDAG][AVR] Add support for lrint and lround intrinsics Integer legalization already supported splitting the output integer of llround and llrint, but did not support this for lround and lrint yet. This is not a problem for 32-bit architectures, but for 8/16-bit architectures like AVR it results in a crash like this: ExpandIntegerResult #0: t7: i32 = lround t6 LLVM ERROR: Do not know how to expand the result of this operator! This patch simply add lrint/lround to the list of ISD opcodes to expand. Fixes https://github.com/llvm/llvm-project/issues/59573. Differential Revision: https://reviews.llvm.org/D140822	2023-01-08 18:56:07 +01:00
Ayke van Laethem	167338de96	[AVR] correctly declare __do_copy_data and __do_clear_bss These two symbols are declared in object files to indicate whether .data needs to be copied from flash or .bss needs to be cleared. They are supported on avr-gcc and reduce firmware size a bit, which is especially important on very small chips. I checked the behavior of avr-gcc and matched it as well as possible. From my investigation, it seems to work as follows: __do_copy_data is set when the compiler finds a data symbol: * without a section name * with a section name starting with ".data" or ".gnu.linkonce.d" * with a section name starting with ".rodata" or ".gnu.linkonce.r" and flash and RAM are in the same address space __do_clear_bss is set when the compiler finds a data symbol: * without a section name * with a section name that starts with .bss Simply checking whether the calculated section name starts with ".data", ".rodata" or ".bss" should result in the same behavior. Fixes: https://github.com/llvm/llvm-project/issues/58857 Differential Revision: https://reviews.llvm.org/D140830	2023-01-08 18:56:06 +01:00
Roman Lebedev	3bb5ddd175	[NFC][Codegen][AVR] Make shift.ll autogenerate-able	2022-12-24 19:26:42 +03:00
Ben Shi	a59e96f1a1	[AVR] Select 16-bit LDS/STS for load/store on AVRTiny. The 32-bit LDS/STS are not available on AVRTiny, so we have to use their compact 16-bit form for memory access. Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D139687	2022-12-23 11:03:45 +08:00
Ben Shi	c41d425030	[AVR][MC] Fix illegal operand forms. These operands are illegal and rejected by avr-gcc. subi r24, -lo8(symobl+offset) sbci r25, -hi8(symobl+offset) And their correct form should be subi r24, lo8(-(symobl+offset)) sbci r25, hi8(-(symobl+offset)) Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D140473	2022-12-23 09:48:06 +08:00
Ben Shi	3730f13428	[AVR] Fix a bug in AsmPrinter when printing memory operands. Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D140383	2022-12-23 09:42:29 +08:00
Ayke van Laethem	d1d3005c9f	[AVR] Do not emit instructions invalid for attiny10 The attiny4/attiny5/attiny9/attiny10 have a slightly modified instruction set that drops a number of useful instructions. This patch makes sure to not emit them on these "reduced tiny" cores. The affected instructions are: * lds and sts (load/store directly from data) * ldd and std (load/store with displacement) * adiw and sbiw (add/sub register pairs) * various other instructions that were emitted without checking whether the chip actually supports them (movw, adiw, etc) There is a variant on lds and sts on these chips, but it can only address a limited portion of the address space and is mainly useful to load/store I/O registers (as an extension to the in and out instructions). I have not implemented it here, implementing it can be done in a separate patch. This patch is not optimal. I'm sure it can be improved a lot. For example, we could teach the instruction selector to not select lddw/stdw instructions so that the weird pointer adjustments are not necessary. But for now I've focused just on correctness, not on code quality. Updates: https://github.com/llvm/llvm-project/issues/53459 Differential Revision: https://reviews.llvm.org/D131867	2022-12-22 17:04:53 +01:00
Ayke van Laethem	5527b21516	[AVR] Do not use R0/R1 on avrtiny This patch makes sure the compiler uses R16/R17 on avrtiny (attiny10 etc) instead of R0/R1. Some notes: * For the NEGW and ROLB instructions, it adds an explicit zero register. This is necessary because the zero register is different on avrtiny (and InstrInfo Uses lines need a fixed register). * Not entirely sure about putting all tests in features/avr-tiny.ll, but it doesn't seem like the "target-cpu"="attiny10" attribute works. Updates: https://github.com/llvm/llvm-project/issues/53459 Differential Revision: https://reviews.llvm.org/D138582	2022-11-28 18:05:55 +01:00
Ayke van Laethem	91ae1afd3c	[AVR] Remove unused register scavenger The LPMW/ELPMW instruction can be modified to use an earlyclobber, which prevents it from using the Z register as an output register. Also see: https://reviews.llvm.org/D131844 Differential Revision: https://reviews.llvm.org/D117957	2022-11-27 15:31:12 +01:00
Ben Shi	f452b9dcaf	[AVR] Fix wrong ABI of AVRTiny. A scalar which exceeds 4 bytes should be returned via stack, other than via registers, on an AVRTiny device. Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D138201	2022-11-23 09:32:47 +08:00
Ayke van Laethem	a560e57a7e	[AVR] Only push and clear R1 in interrupts when necessary R1 is a reserved register, but LLVM gives the APIs to know when it is used or not. So this patch uses these APIs to only save/clear/restore R1 in interrupts when necessary. The main issue here was getting inline assembly to work. One could argue that this is the job of Clang, but for consistency I've made sure that R1 is always usable in inline assembly even if that means clearing it when it might not be needed. Information on inline assembly in AVR can be found here: https://www.nongnu.org/avr-libc/user-manual/inline_asm.html#asm_code Essentially, this seems to suggest that r1 can be freely used in avr-gcc inline assembly, even without specifying it as an input operand. Differential Revision: https://reviews.llvm.org/D117426	2022-08-15 14:29:38 +02:00
Ayke van Laethem	43a8dbc5be	[AVR] Use @earlyclobber instead of register scavenging The code to support the case when the register allocator has assigned the same register to the src and the dst register operand isn't actually needed: * LDWRdPtr and LDDWRdPtrQ have an @earlyclobber on the output register, so the register allocator will make sure to allocate a different register for the output register. * LDDWRdYQ does not have an @earlyclobber, but the pointer register is the fixed Y register which is reserved. The register allocator won't use reserved registers for the output value. This removes a special case in the code that makes the pseudo instruction expansion pass more complicated than it needs to be. Differential Revision: https://reviews.llvm.org/D131844	2022-08-15 14:29:38 +02:00
Ayke van Laethem	de48717fcf	[AVR] Support unaligned store This patch really just extends D39946 towards stores as well as loads. While the patch is in SelectionDAGBuilder, it only applies to AVR (the only target that supports unaligned atomic operations). Differential Revision: https://reviews.llvm.org/D128483	2022-08-15 14:29:37 +02:00
Patryk Wychowaniec	5650688e72	[AVR] Fix expanding MOVW for overlapping registers When expanding a MOVW (16-bit copy) to two MOVs (8-bit copy), the lower byte always comes first. This is incorrect for corner cases like '$r24r23 -> $r25r24', in which the higher byte copy should come first. Current patch fixes that bug as recorded at https://github.com/rust-lang/rust/issues/98167 Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D128588	2022-06-26 17:20:07 +08:00
Ivan Kosarev	ad1d60c3be	[FileCheck] Catch missspelled directives. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125604	2022-05-26 11:37:19 +01:00
Patryk Wychowaniec	6641c57aeb	[AVR] Always expand STDSPQRr & STDWSPQRr Currently, STDSPQRr and STDWSPQRr are expanded only during AVRFrameLowering - this means that if any of those instructions happen to appear _outside_ of the typical FrameSetup / FrameDestroy context, they wouldn't get substituted, eventually leading to a crash: ``` LLVM ERROR: Not supported instr: <MCInst XXX <MCOperand Reg:1> <MCOperand Imm:15> <MCOperand Reg:53>> ``` This commit fixes this issue by moving expansion of those two opcodes into AVRExpandPseudo. This bug was originally discovered due to the Rust compiler_builtins library. Its 0.1.37 release contained a 128-bit software division/remainder routine that exercised this buggy branch in the code. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D123528	2022-05-05 03:10:59 +00:00
Patryk Wychowaniec	d16a631c12	[AVR] Merge AVRRelaxMemOperations into AVRExpandPseudoInsts This commit contains a refactoring that merges AVRRelaxMemOperations into AVRExpandPseudoInsts, so that we have a single place in code that expands the STDWPtrQRr opcode. Seizing the day, I've also fixed a couple of potential bugs with our previous implementation (e.g. when the destination register was killed, the previous implementation would try to .addDef() that killed register, crashing LLVM in the process - that's fixed now, as proved by the test). Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D122533	2022-04-11 02:42:13 +00:00
Ben Shi	bce2e208e0	[AVR] Optimize int16 airthmetic right shift for shift amount 7/14/15 Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115618	2022-03-26 06:53:27 +00:00
Ben Shi	49b0b5f0fa	[AVR][NFC] Fix incorrect register states in expanding pseudo instructions Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D118354	2022-03-25 16:02:15 +00:00
Ben Shi	f319c24570	[AVR] Reject/Reserve R0~R15 on AVRTiny. Reviewed By: aykevl, dylanmckay Differential Revision: https://reviews.llvm.org/D121672	2022-03-24 02:33:51 +00:00
Ben Shi	d7afea9eb8	[AVR][MC] Emit some aliases for GPRs and IO registers Emit the following aliases (if available): .set __tmp_reg__, [0\|16] .set __zero_reg__, [1\|17] .set __SREG__, 63 .set __SP_H__, 62 .set __SP_L__, 61 .set __EIND__, 60 .set __RAMPZ__, 59 Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D119807	2022-03-24 02:08:22 +00:00
Ben Shi	45638931fb	[AVR] Generate 'rcall' instead of 'call' on avr2 and avr25 The 'call' (long call) instruction is available on avr3 and above, and devices in avr2 and avr25 should use the 'rcall' (short call) instruction for function calls. Reviewed By: aykevl, dylanmckay Differential Revision: https://reviews.llvm.org/D121539	2022-03-23 02:00:15 +00:00
Ben Shi	3fd9a320da	[AVR] Fix incorrect calling convention for varargs functions An i8 argument should only cost 1 byte on the stack. This is compatible with avr-gcc. There are also more test cases (of calling convention) are added. Reviewed By: aykevl, dylanmckay Differential Revision: https://reviews.llvm.org/D121767	2022-03-23 02:00:15 +00:00
Ben Shi	fa2d31e9e6	[AVR] Fix a potential assert failure Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D119416	2022-02-11 02:25:58 +00:00
Ayke van Laethem	44ee9864a4	[AVR][NFC] Make atomics tests easier to read Use the same mnemonics in the tests that are used in the AtomicLoadOp pattern ($rd, $rr) but use RR1 instead of $operand. This matches similar tests in load8.ll. Differential Revision: https://reviews.llvm.org/D117991	2022-02-02 09:10:39 +01:00
Ayke van Laethem	316664783d	[AVR] Fix atomicrmw result value This patch fixes the atomicrmw result value to be the value before the operation instead of the value after the operation. This was a bug, left as a FIXME in the code (see https://reviews.llvm.org/D97127). From the LangRef: > The contents of memory at the location specified by the <pointer> > operand are atomically read, modified, and written back. The original > value at the location is returned. Doing this expansion early allows the register allocator to arrange registers in such a way that commutable operations are simply swapped around as needed, which results in shorter code while still being correct. Differential Revision: https://reviews.llvm.org/D117725	2022-02-02 09:10:39 +01:00
Ayke van Laethem	116ab78694	[AVR] Make use of the constant value 0 in R1 The register R1 is defined to have the constant value 0 in the avr-gcc calling convention (which we follow). Unfortunately, we don't really make use of it. This patch replaces `LDI 0` instructions with a copy from R1. This reduces code size: my AVR build of compiler-rt goes from 50660 to 50240 bytes of code size, which is a 0.8% reduction. Presumably it will also improve execution speed, although I didn't measure this. Differential Revision: https://reviews.llvm.org/D117425	2022-01-23 17:08:01 +01:00
Ayke van Laethem	153359180a	[AVR] Remove regalloc workaround for LDDWRdPtrQ Background: https://github.com/avr-rust/rust-legacy-fork/issues/126 In short, this workaround was introduced to fix a "ran out of registers during regalloc" issue. The root cause has since been fixed in https://reviews.llvm.org/D54218 so this workaround can be removed. There is one test that changes a little bit, removing a single instruction. I also compiled compiler-rt before and after this patch but didn't see a difference. So presumably the impact is very low. Still, it's nice to be able to remove such a workaround. Differential Revision: https://reviews.llvm.org/D117831	2022-01-23 17:08:00 +01:00
Ben Shi	94173dc24c	[AVR] Generate ELPM for loading byte/word from extended program memory Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D116493	2022-01-20 02:53:10 +00:00
Ben Shi	c1dd607463	[AVR][MC] Generate section '.progmemX.data' for extended flash banks Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115987	2022-01-20 02:53:10 +00:00
Ayke van Laethem	ca27b026f9	[AVR] Do not clear r0 at interrupt entry There is no reason to do this: it's a scratch register and can therefore hold any arbitrary value. And because it is in an interrupt, this code is performance critical so it should be as short as possible. I believe r0 was cleared because of the following: 1. There used to be a bug that the cleared register was r0, not r1 as it should have been. 2. This was fixed in https://reviews.llvm.org/D99467, but left the code to clear r0. This patch completes D99467 by removing the `clr r0` instruction. Differential Revision: https://reviews.llvm.org/D116756	2022-01-19 14:22:13 +01:00
Ayke van Laethem	3d59d94a20	[AVR] Mark call-clobbered registers as clobbered in interrupt handlers I have matched the RISCV backend, which only uses the interrupt save list in getCalleeSavedRegs, _not_ in getCallPreservedMask. I don't know the details of these two methods, but with it, the correct amount of registers is saved and restored. Without this patch, practically all interrupt handlers that call a function will miscompile. I have added a test to verify this behavior. I've also added a very simple test to verify that more normal interrupt operations (in this case, incrementing a global value) behave as expected. Differential Revision: https://reviews.llvm.org/D116551	2022-01-19 14:22:13 +01:00
Ayke van Laethem	f41d2d9469	[AVR] Remove redundant dynalloca SP save/restore pass I think this pass was previously used under the assumption that most functions would not need a frame pointer and it would be more efficient to store the old stack pointer in a regular register pair. Unfortunately, right now we're forced to always reserve the Y register as a frame pointer: whether or not this is needed is only known after regsiter allocation at which point it doesn't make sense anymore to mark it as non-reserved. Therefore, it makes sense to use the Y register to store the old stack pointer in functions with dynamic allocas (with a variable size or not in the entry block). Knowing this can make the code around dynamic allocas a lot simpler: simply save/restore the frame pointer. This is especially relevant in functions that have a frame pointer anyway (for example, because they have stack spills). The stack restore in the epilogue will implicitly restore the old stack pointer, so there is no need to store the old stack pointer separately. It even reduces register pressure as a side effect. Differential Revision: https://reviews.llvm.org/D97815	2022-01-19 14:22:13 +01:00
Nikita Popov	f430c1eb64	[Tests] Add elementtype attribute to indirect inline asm operands (NFC) This updates LLVM tests for D116531 by adding elementtype attributes to operands that correspond to indirect asm constraints.	2022-01-06 14:23:51 +01:00
Ben Shi	99e7bf46c9	[AVR] Optimize int16 shift operation for shift amount greater than 8 Skip operation on the lower byte in int16 logical left shift when shift amount is greater than 8. Skip operation on the higher byte in int16 logical & arithmetic right shift when shift amount is greater than 8. Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115594	2022-01-04 11:48:50 +00:00
Ben Shi	f4ef79306c	[AVR] Optimize int8 arithmetic right shift 6 bits Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115593	2022-01-04 10:36:03 +00:00
Ben Shi	9fb4e79d06	Revert "[AVR] Optimize int8 arithmetic right shift 6 bits" This reverts commit `5723261370`. There are failures as reported in https://lab.llvm.org/buildbot#builders/16/builds/21638 https://lab.llvm.org/buildbot#builders/104/builds/5394	2022-01-04 04:14:15 +00:00
Ben Shi	5723261370	[AVR] Optimize int8 arithmetic right shift 6 bits Reviewed By: aykevl Differential Revision: https://reviews.llvm.org/D115593	2022-01-04 03:20:29 +00:00
Nico Weber	4f9a5c2a14	[asm] Remove explicit branch for modifier 'l' No intended behavior change. EmitGCCInlineAsmStr() used to explicitly check for modifier 'l' after handling block address and machine basic block operands. This prevented passing a MachineOperand with 'l' modifier to PrintAsmMemoryOperand(). Conceptually that seems kind of nice, but in practice the overrides of PrintAsmMemoryOperand() in all () AsmPrinter subclasses already reject modifiers they don't know about, and none of them don't know about 'l'. So removing this doesn't have a behavior difference, is less code, and it makes EmitGCCInlineAsmStr() and EmitMSInlineAsmStr() more similar, to prepare for merging them later. (Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that always works with X86AsmPrinter I think, and X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l' modifier, so it's hard to motivate adding that branch.) : The one exception was AVRAsmPrinter, which had an llvm_unreachable instead of returning true. So this commit changes that, so that the AVR target keeps emitting an error instead of crashing when passing a mem operand with a :l modifier to it. All the other targets already don't crash on this. Differential Revision: https://reviews.llvm.org/D114216	2021-11-19 09:19:53 -05:00
Guozhi Wei	6599961c17	[TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation This patch contains following enhancements to SrcRegMap and DstRegMap: 1 In findOnlyInterestingUse not only check if the Reg is two address usage, but also check after commutation can it be two address usage. 2 If a physical register is clobbered, remove SrcRegMap entries that are mapped to it. 3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap entry only when the COPY instruction is coalescable. (The COPY src is killed) With these enhancements isProfitableToCommute can do better commute decision, and finally more register copies are removed. Differential Revision: https://reviews.llvm.org/D108731	2021-10-11 15:28:31 -07:00
Matt Jacobson	75abeb64ce	[AVR] emit 'MCSA_Global' references to '__do_global_ctors' and '__do_global_dtors' Emit references to '__do_global_ctors' and '__do_global_dtors' to allow constructor/destructor routines to run. Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D107133	2021-08-05 10:37:36 +08:00
Ayke van Laethem	4d7f5c0a85	[AVR] Only support sp, r0 and r1 in llvm.read_register Most other registers are allocatable and therefore cannot be used. This issue was flagged by the machine verifier, because reading other registers is considered reading from an undefined register. Differential Revision: https://reviews.llvm.org/D96969	2021-07-24 14:03:27 +02:00
Ayke van Laethem	41f905b211	[AVR] Fix rotate instructions This patch fixes some issues with the RORB pseudo instruction. - A minor issue in which the instructions were said to use the SREG, which is not true. - An issue with the BLD instruction, which did not have an output operand. - A major issue in which invalid instructions were generated. The fix also reduce RORB from 4 to 3 instructions, so it's also a small optimization. These issues were flagged by the machine verifier. Differential Revision: https://reviews.llvm.org/D96957	2021-07-24 14:03:26 +02:00
Ayke van Laethem	6aa9e746eb	[AVR] Expand large shifts early in IR This patch makes sure shift instructions such as this one: %result = shl i32 %n, %amount are expanded just before the IR to SelectionDAG conversion to a loop so that calls to non-existing library functions such as __ashlsi3 are avoided. The generated code is currently pretty bad but there's a lot of room for improvement: the shift itself can be done in just four instructions. Differential Revision: https://reviews.llvm.org/D96677	2021-07-24 14:03:26 +02:00
Ayke van Laethem	feda08b70a	[AVR] Do not chain stores in call frame setup Previously, AVRTargetLowering::LowerCall attempted to keep stack stores in order with chains. Perhaps this worked in the past, but it does not work now: it appears that the SelectionDAG legalization phase removes these chains. Therefore, I've removed these chains entirely to match X86 (which, similar to AVR, also prefers to use push instructions over stack-relative stores to set up a call frame). With this change, all the stack stores are in a somewhat reasonable order. Differential Revision: https://reviews.llvm.org/D97853	2021-07-24 14:03:26 +02:00

1 2 3 4

186 Commits