clang-p2996

Author	SHA1	Message	Date
Nikita Popov	c16559137c	[IndVars] Avoid unnecessary truncate for zext nneg use When performing sext IV widening, if one of the narrow uses is in a zext nneg, we can treat it like an sext and avoid the insertion of a trunc.	2023-12-22 11:30:17 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Philip Reames	ffb2af3ed6	[SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431 ) LSR uses SCEVExpander to generate induction formulas. The expander internally tries to reuse existing IR expressions. To do that, it needs to strip any poison generating flags (nsw, nuw, exact, nneg, etc..) which may not be valid for the newly added users. This is conservatively correct, but has the effect that LSR will strip nneg flags on zext instructions involved in trip counts in loop preheaders. To avoid this, this patch adjusts the expanded to reinfer the flags on the CSE candidate if legal for all possible users. This should fix the regression reported in https://github.com/llvm/llvm-project/issues/71200. This should arguably be done inside canReuseInstruction instead, but doing it outside is more conservative compile time wise. Both canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so right now we are performing work which is roughly O(N^2) in the size of the operand graph. We should fix that before making the per operand step more expensive. My tenative plan is to land this, and then rework the code to sink the logic into more core interfaces.	2023-12-07 13:20:36 -08:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Nikita Popov	8f40ef3479	[IndVarSimplify] Regenerate test checks (NFC)	2023-12-04 15:49:35 +01:00
Markos Horro	9d2903c8e5	[IndVars] Add check of loop invariant for trunc instructions (#71072 ) The same idea as in `34d380e1f6`, but considering truncation instructions. Improvement for #59633.	2023-11-08 11:16:23 +00:00
Philip Reames	551c280cfd	[indvars] Always fallback to truncation if AddRec widening fails (#70967 ) The current code structure results in cases where if a) we can't clone the IV user (because it's not in our whitelist) or b) can't prove the SCEV expressions are identical, we'd sometimes leave both the original unwiddened IV and the partially widdened IV in code. Instead, just truncate thw wide IV to the use - same as what we'd do if we couldn't find an addrec to start with. Noticed this while playing with changing how we produce addrecs. The current structure results in a very tight interlock between SCEVs internal capabilities and indvars code.	2023-11-07 07:49:39 -08:00
Philip Reames	ded0d7b964	Test update after a7f35d My patch needed rebasing before commit. This is the danger of the github web push interface which doesn't do a new build close to ToT.	2023-11-07 07:39:15 -08:00
Philip Reames	a7f35d54ee	[SCEV] Extend isImpliedCondOperandsViaRanges to independent predicates (#71110 ) As far as I can tell, there's nothing in this code which actually assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred RHS) are the same. Noticed while investigating something else, this is purely an oppurtunistic optimization while I'm looking at the code. Unfortunately, this doesn't solve my original problem. :)	2023-11-07 07:25:47 -08:00
Philip Reames	5adf6ab7ff	Revert "[IndVars] Generate zext nneg when locally obvious" This reverts commit `a6c8e27b3a`. It appears likely to have caused https://lab.llvm.org/buildbot/#/builders/57/builds/30988.	2023-11-03 11:19:14 -07:00
Philip Reames	1ffea97ffd	[indvars] Support known positive extends in getExtendedOperandRecurrence (#70990 ) IndVars has the existing notion of a narrow definition which is known to positive and thus both sign and zero extension kinds are actually the same operations. There's existing logic for forming a SCEV based on the extension kind and the no-wrap flags. This change extends that logic to form the opposite extension kind for a positive def if doing so is allowed by the flags. Note that we already do something analogous for the getWideRecurrence case as well.	2023-11-03 10:21:30 -07:00
Philip Reames	7ca0f4418a	[indvars] Add tests for countdown style loops w/nonnegative IVs Adding test coverage in advance of upcoming changes. Note that these tests specifically use unsigned comparisons for the backends, the signed versions are fairly well handled by existing logic.	2023-11-03 10:13:49 -07:00
Philip Reames	a6c8e27b3a	[IndVars] Generate zext nneg when locally obvious zext nneg was recently added to the IR in #67982. This patch teaches SimplifyIndVars to prefer zext nneg over both sext and plain zext, when a local SCEV query indicates the source is non-negative. The choice to prefer zext nneg over sext looks slightly aggressive here, but probably isn't so much in practice. For cases where we'd "remember" the range fact, instcombine would convert the sext into a zext nneg anyways. The only cases where this produces a different result overall are when SCEV knows a non-local fact, and it doesn't get materialized into the IR. Those are exactly the cases where using zext nneg are most useful. We do run the risk of e.g. a missing combine - since we haven't updated most of them yet - but that seems like a manageable risk. Note that there are much deeper algorithmic changes we could make to this code to exploit zext nneg, but this seemed like a reasonable and low risk starting point.	2023-11-03 09:20:59 -07:00
Philip Reames	015c06ade0	Regenerate a couple scev/indvars tests [nfc] Update to modern output to reduce spurious deltas in upcoming change.	2023-11-03 08:42:59 -07:00
Philip Reames	29fd9bab2c	[indvars] Add coverage for widening mul involving non-negative IV	2023-11-02 13:05:45 -07:00
Philip Reames	06145dcdcc	[indvars] Start building test coverage for widening non-negative IVs	2023-11-02 12:50:16 -07:00
Philip Reames	d0b2974e9b	[indvars] Regenerate a couple of auto-generated test cases [nfc] Reducing spurious diff in upcoming changes.	2023-11-01 10:07:45 -07:00
Philip Reames	a78f5c0649	[IndVars] Use IRBuilder in eliminateTrunc [nfc-ish] (#70836 ) Mostly a cleanup so that we don't need to manually emit instructions, and can eagerly constant fold where relevant.	2023-10-31 14:37:57 -07:00
Philip Reames	7fe4025c16	Autogenerate a indvars test for ease of update in future change	2023-10-31 09:57:11 -07:00
Philip Reames	f8742b8d6a	[SCEV] Teach SCEVExpander to use zext nneg when possible (#70815 ) zext nneg was recently added to the IR in #67982. Teaching SCEVExpander to emit nneg when possible is valuable since SCEV may have proved non-trivial facts about loop bounds which would otherwise be lost when materializing the value.	2023-10-31 09:33:07 -07:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Nikita Popov	32ec6d91a1	[SCEV] Make invalidation in SCEVCallbackVH more thorough (#68316 ) When a SCEVCallbackVH is RAUWed, we currently do a def-use walk and remove dependent instructions from the ValueExprMap. However, unlike SCEVs usual invalidation, this does not forget memoized values. The end result is that we might end up removing a SCEVUnknown from the map, while that expression still has users. Due to that, we may later fail to invalide those expressions. In particular, invalidation of loop dispositions only does something if there is an expression for the value, which would not be the case here. Fix this by using the standard forgetValue() API, instead of rolling a custom variant. Fixes https://github.com/llvm/llvm-project/issues/68285.	2023-10-10 10:55:57 +02:00
Nikita Popov	1c3fdb3d1e	Revert "[SCEV] Don't invalidate past dependency-breaking instructions" Unforuntately, the assumption underlying this optimization is incorrect for getSCEVAtScope(): A SCEVUnknown instruction with operands that have constant loop exit values can evaluate to a constant, thus creating a dependency from an "always unknown" instruction. Losing this optimization is quite unfortunate, but it doesn't seem like there is any simple workaround for this. Fixes #68260. This reverts commit `3ddd1ffb72`.	2023-10-09 16:35:01 +02:00
Nikita Popov	a56071ffb7	[SCEV] Don't require positive BTC when non-zero is sufficient The only thing we care about here is that we don't exit on the first iteration. Whether the BTC is large enough to overflow the signed integer space is not relevant.	2023-10-09 14:42:09 +02:00
Nikita Popov	708999e5b1	[IndVars] Add test for phi select exit value with large BTC (NFC)	2023-10-09 14:42:09 +02:00
Nikita Popov	e18dca257f	[IndVars] Add test for #68260 (NFC)	2023-10-06 16:09:44 +02:00
Nikita Popov	7078993ff2	[IndVars] Check expansion safety during LFTR Check isSafeToExpand() before expanding the exit count. Otherwise we may incorrectly speculate a udiv. Fixes https://github.com/llvm/llvm-project/issues/66986.	2023-09-21 14:22:01 +02:00
Nikita Popov	0fe905daf0	[IndVars] Add test for #66986 (NFC)	2023-09-21 12:47:42 +02:00
Nikita Popov	efe4e7a026	[SCEV] Fix incorrect nsw inference for multiply of addrec (#66500 ) SCEV currently preserves the nsw flag when performing an nsw multiply of an nsw addrec. While this is legal for nuw, this is not generally the case for nsw. This is because nsw mul does not distribute over nsw add: https://alive2.llvm.org/ce/z/mergCt Instead, we need either both nuw and nsw to be set (https://alive2.llvm.org/ce/z/7wpgGc) or explicitly prove that the distributed multiplications are also nsw (https://alive2.llvm.org/ce/z/wef9su). Fixes https://github.com/llvm/llvm-project/issues/66066.	2023-09-18 08:23:10 +02:00
Nikita Popov	0e67a68478	[SCEV] Add tests for PR66066 (NFC)	2023-09-15 13:53:11 +02:00
Nikita Popov	d82f0b74de	[IndVars] Don't assume backedge value is instruction (PR64891) In degenerate cases, the backedge value can be folded to poison. Fixes https://github.com/llvm/llvm-project/issues/64891.	2023-08-22 10:33:33 +02:00
Nikita Popov	1c6e6432ca	[SCEVExpander] Fix incorrect reuse of more poisonous instructions (PR63763) SCEVExpander tries to reuse existing instruction with the same SCEV expression. However, doing this replacement blindly is not safe, because the instruction might be more poisonous. What we were already doing is to drop poison-generating flags on the reused instruction. But this is not the only way that more poison can be introduced. The poison-generating flag might not be directly on the reused instruction, or the poison contribution might come from something like 0 * %var, which folds to 0 but can still introduce poison. This patch fixes the issue in a principled way, by determining which values can contribute poison to the SCEV expression, and then checking whether any additional values can contribute poison to the instruction being reused. Poison-generating flags are dropped if doing that enables reuse. This is a pretty big hammer and does cause some regressions in tests, but less than I would have expected. I wasn't able to come up with a less intrusive fix that still satisfies the correctness requirements. Fixes https://github.com/llvm/llvm-project/issues/63763. Fixes https://github.com/llvm/llvm-project/issues/63926. Fixes https://github.com/llvm/llvm-project/issues/64333. Fixes https://github.com/llvm/llvm-project/issues/63727. Differential Revision: https://reviews.llvm.org/D158181	2023-08-22 09:27:07 +02:00
Nikita Popov	ed72dc8c1f	[IndVarSimplify] Add test for PR63763 (NFC)	2023-08-11 16:53:55 +02:00
Matt Arsenault	25bc999d1f	Intrinsics: Add type overload to stacksave and stackstore This allows use with non-0 address space stacks. llvm_ptr_ty should never be used. This could use some more percolation up through mlir, but this is enough to fix existing tests. https://reviews.llvm.org/D156666	2023-08-09 18:33:11 -04:00
Eli Friedman	60712732ea	[IndVars] Teach replaceCongruentIVs to avoid scrambling induction variables replaceCongruentIVs analysis is based on ScalarEvolution; this makes comparing different PHIs and performing the replacement straightforward. However, it can have some side-effects: it isn't aware whether an induction variable is in canonical form, so it can perform replacements which obscure the meaning of the IR. In test22 in widen-loop-comp.ll, the resulting loop can't be analyzed by ScalarEvolution at all. My attempted solution is to restrict the transform: don't try to replace induction variables using PHI nodes that don't represent simple induction variables. I'm not sure if this is the best solution; suggestions welcome. Differential Revision: https://reviews.llvm.org/D121950	2023-07-12 12:27:39 -07:00
Nikita Popov	edb2fc6dab	[llvm] Remove explicit -opaque-pointers flag from tests (NFC) Opaque pointers mode is enabled by default, no need to explicitly enable it.	2023-07-12 14:35:55 +02:00
Nikita Popov	0e34b6a504	[LCSSA] Compute SCEV of LCSSA phi if original instruction had SCEV The backstory is that the LCSSA invalidation we perform here is not really necessary from a SCEV perspective. However, other code may rely on the fact that invalidating only LCSSA phi nodes is sufficient for transforms like loop peeling (see https://reviews.llvm.org/D149331#4398582 for more details). However, performing invalidation during LCSSA construction also means that SCEV expansion (which may need to construct LCSSA) can invalidate SCEV, which is somewhat unexpected and code may not be prepared to deal with it (see the added test case, reported at https://reviews.llvm.org/D149435#4428219). Instead of invalidating SCEV, ensure that the LCSSA phi node also has cached SCEV if the original instruction did. This means that later invalidation of LCSSA phi nodes will work as expected. This should avoid both the above issues and be more efficient. Differential Revision: https://reviews.llvm.org/D153145	2023-06-26 14:43:31 +02:00
zhongyunde	34d380e1f6	[IndVars] Add check of loop invariant for indirect use We usually only check direct use instruction of IV, while the bitcast of 'ptrtoint ptr to i64' doesn't affect the result, so go a step further. Fix https://github.com/llvm/llvm-project/issues/59633. Reviewed By: markoshorro Differential Revision: https://reviews.llvm.org/D151877	2023-06-03 22:29:09 +08:00
Nikita Popov	dfb369399d	[ValueTracking] Directly use KnownBits shift functions Make ValueTracking directly call the KnownBits shift helpers, which provides more precise results. Unfortunately, ValueTracking has a special case where sometimes we determine non-zero shift amounts using isKnownNonZero(). I have my doubts about the usefulness of that special-case (it is only tested in a single unit test), but I've reproduced the special-case via an extra parameter to the KnownBits methods. Differential Revision: https://reviews.llvm.org/D151816	2023-06-01 09:46:16 +02:00
Nikita Popov	dc81e69eb1	[IndVars] Check expansion safety in makeIVComparisonInvariant() (PR62992) Make sure the invariant expressions are safe to expand. In particular, we should not speculative a trapping division into the preheader. Fixes https://github.com/llvm/llvm-project/issues/62992.	2023-05-31 11:21:35 +02:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to `b71edfaa4e` since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Krzysztof Drewniak	53a4adc0de	[AMDGPU] Fix crash with 160-bit p7's by manually defining getPointerTy While pointers in address space 7 (128 bit rsrc + 32 bit offset) should be rewritten out of the code before IR translation on AMDGPU, higher-level analyses may still call MVT getPointerTy() and the like on the target machine. Currently, since there is no MVT::i160, this operation ends up causing crashes. The changes to the data layout that caused such crashes were D149776. This patch causes getPointerTy() to return the type MVT::v5i32 and getPointerMemTy() to be MVT::v8i32. These are accurate types, but mean that we can't use vectors of address space 7 pointers during codegen. This is mostly OK, since vectors of buffers aren't supported in LPC anyway, but it's a noticable limitation. Potential alternative solutions include adjusting getPointerTy() to return an EVT or adding MVT::i160 and MVT::i256, both of which are rather disruptive to the rest of the compiler. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D150002	2023-05-12 15:57:53 +00:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit `3f2fbe92d0`. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit `f9c1ede254`. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Florian Hahn	b14be1e7c0	[SCEV] Use object size for globals to sharpen ranges. The highest address the object can start is ObjSize bytes before the end (unsigned max value). If this value is not a multiple of the alignment, the last possible start value is the next lowest multiple of the alignment. Note: The computations cannot overflow, because if they would there's no possible start address for the object. At the moment, this is limited to GlobalVariables, because I could not find a API similar to getObjectSize to also get the alignment of the object. With such an API, this can be generalized to general addresses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D149483	2023-04-29 21:33:30 +01:00
Florian Hahn	6ebe394915	[IndVars] Add test for simplifying a induction cmp against global.	2023-04-28 20:41:09 +01:00
Nikita Popov	079c525f20	[SCEV] Try simplifying phi before createNodeFromSelectLikePHI() Sometimes a phi can both be trivial and match the createNodeFromSelectLikePHI() fold. In that case it is generally more profitable to look through the phi node.	2023-04-27 15:07:19 +02:00
Max Kazantsev	ab07cbe437	[SCEV] Support sub in and negative constants willNotOverflow This lifts two TODOs from this function, allowing us to prove no-overflow whether it happens through max int (up) or through min int (down) for both and and sub. Differential Revision: https://reviews.llvm.org/D148618 Reviewed By: dmakogon	2023-04-25 16:40:37 +07:00
Nikita Popov	d003c01c30	[LV][IndVars] Move test to correct directory and regenerate (NFC) For some reason, an IndVarSimplify test was in the LoopVectorize directory.	2023-04-21 18:03:41 +02:00

1 2 3 4 5 ...

794 Commits