This is the intrinsic version of #146349, and handles fabs as well as
other intrinsics.
It's largely a copy of InstCombinerImpl::foldShuffledIntrinsicOperands
but a bit simpler since we don't need to find a common mask.
Creating a separate function seems to be cleaner than trying to shoehorn
it into the existing one.
Delay the erasure of an op, so that the insertion point of the rewriter
remains valid.
This commit is in preparation of the One-Shot Dialect Conversion
refactoring. (The current implementation works with the current dialect
conversion driver because op erasure is delayed.)
This patch is part of an effort to remove the
`ResolveSDKPathFromDebugInfo` method, and more specifically the variant
which takes a Module as argument.
See the following PR for a follow up on what to do:
- https://github.com/llvm/llvm-project/pull/144913.
---------
Co-authored-by: Michael Buch <michaelbuch12@gmail.com>
Add a flag to tco for emitting the final MLIR, prior to lowering to LLVM
IR. This is intended to produce output that can be passed directly to
mlir-translate.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Summary:
This adds a basic outline for adding 'conformance' tests. These are
tests that are intended to check device code against a standard. In this
case, we will expect this to be filled with math conformance tests to
make sure their results are within the ULP requirements we demand.
Right now this just *assumes* the GPU libc is there, meaning you'll
likely need to do a manual `ninja` before doing `ninja -C
runtimes/runtimes-bins offload.conformance`.
Given an atomic operation `w = max(w, x1, x2, ...)` rewrite it as `w =
max(w, max(x1, x2, ...))`. This will avoid unnecessary non-atomic
comparisons inside of the atomic operation (min/max are expanded
inline).
In particular, if some of the x_i's are optional dummy parameters in the
containing function, this will avoid any presence tests within the
atomic operation.
Fixes https://github.com/llvm/llvm-project/issues/144838
std::prev(Paired) will get the previous instruction, that might skip
over the instructions in a bundle to the BUNDLE itself. Change it to
Paired->getPrevNode() to make sure we update the registers in each
instruction in the bundle.
This unbreaks C++20 buildbot that was broken since
402baea0a9.
With implicit conversion in C++20 compilation mode the spaceship
will unintentionally be based on `operator bool`:
```cpp
auto foo(Location L, Location R) {
return L <=> R;
// Equivalent to the following line due to implicit conversions.
// return L.operator bool() <=> R.operator bool();
}
```
The spaceship operator is rarely used explicitly, but its implicit uses
in the STL may cause surprising results, as exposed by the use of `std::tie`
in 402baea0a9, which ended up changing the
comparisons results unintentionally.
Add a convenience wrapper struct for the `bit_value_t` enum type to host
various constructors, query, and printing support. Also refactor related
code in several places. In `getBitsField`, use `llvm::append_range` and
`SmallVector::append()` and eliminate manual loops. Eliminate
`emitNameWithID` and instead use the `operator <<` that does the same
thing as this function. Have `BitValue::getValue()` (replacement for
`Value`) return std::optional<> instead of -1 for unset bits. Terminate
with a fatal error when a decoding conflict is encountered.
As a preliminary to making DIL the default implementation for
'frame var', ran check-lldb forcing 'frame var' to always use DIL,
and discovered a few failing tests. This fixes most of them. The only
remaining failing test is TestDAP_evaluate.py, which now passes
a test case that the test says should fail (still investigating this).
Changes in this PR:
- Sets correct VariableSP, as well as returning ValueObjectSP (needed
for several watchpoint tests).
- Updates error messages, when looking up members, to match what the
rest of LLDB expects. Also update appropriate DIL tests to expect the
updated error messages.
- Updates DIL parser to look for and accept "(anonymous namespace)::" at
the front of a variable name.
This PR adds a mechanism, so that downstream consumers can pass in
control functions for the application of these patterns. This change
shouldn't affect any consumers of this method that do not specify a
controlFn. The controlFn always gets the source operand of the consumer
in each of the patterns as a parameter.
In IREE, we (will) use it to control preventing folding patterns that
would inhibit fusion. See IREE issue
[#20896](https://github.com/iree-org/iree/issues/20896) for more
details.
Changes from Commit 40aab0412f "[test]
Migrate -gcc-toolchain with space separator to --gcc-toolchain=" made
two previously different RUN lines equal.
Remove one RUN line.
I found this peculiar comment in EarlyCSE:
1c78d8d9d7/llvm/lib/Transforms/Scalar/EarlyCSE.cpp (L1620-L1624)
Looking back over history, this seems to be referring to the
aarch64.neon.stN intrinsics, which are indeed not marked writeonly
(though the ldN intrinsics are readonly).
Possibly I'm missing something special about these intrinsics, but I
think it is safe to mark them as writeonly.
In both `bubbleUpPackOpThroughGenericOp()` or
`pushDownUnPackOpThroughGenericOp()`, we can simplify the lowered IR by
removing the pack of an empty when the init tensor isn't used in generic
op. Instead of packing an empty tensor, the empty tensor can be
forwarded to the generic output. This allows cleaner result after data
layout propagation.
This pass reifies the shapes of a subset of
`ReifyRankedShapedTypeOpInterface` ops with `tensor` results.
The pass currently only supports result shape type reification for:
- tensor::PadOp
- tensor::ConcatOp
It addresses a representation gap where implicit op semantics are needed
to infer static result types from dynamic
operands. But it does so by using `ReifyRankedShapedTypeOpInterface` as
the source of truth rather than the op itself.
As a consequence, this cannot generalize today.
TODO: in the future, we should consider coupling this information with
op "transfer functions" (e.g.
`IndexingMapOpInterface`) to provide a source of truth that can work
across result shape inference, canonicalization and
op verifiers.
The pass replaces the operations with their reified versions, when more
static information can be derived, and inserts
casts when results shapes are updated.
Example:
```mlir
#map = affine_map<(d0) -> (-d0 + 256)>
func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
%0 = affine.apply #map(%arg1)
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
%padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index):
tensor.yield %arg0 : f32
} : tensor<1x?x64xf32> to tensor<1x?x64xf32>
return %padded : tensor<1x?x64xf32>
}
// mlir-opt --reify-result-shapes
#map = affine_map<()[s0] -> (-s0 + 256)>
func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
%0 = affine.apply #map()[%arg1]
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
%padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index):
tensor.yield %arg0 : f32
} : tensor<1x?x64xf32> to tensor<1x256x64xf32>
%cast = tensor.cast %padded : tensor<1x256x64xf32> to tensor<1x?x64xf32>
return %cast : tensor<1x?x64xf32>
}
```
---------
Co-authored-by: Fabian Mora <fabian.mora-cordero@amd.com>
Fix a couple of unhandled edge cases in offload-tblgen that were found
by static analysis
* `LineStart` may wrap around to 0 when processing multi-line strings.
The value is not actually being used in that case, but still better to
explicitly handle it
* Possible unchecked nullptr when processing parameter flags
This PR introduces support for the DWARF64 format, enabling handling of
64-bit DWARF sections as defined by the DWARF specification. The update
includes adjustments to header parsing and modification of form values
to accommodate 64-bit offsets and values.
Also Added the testcase to verify the DWARF64 format.
This implements the async, wait, if, and if_present (as well as
device_type, but that is a detail of async/wait) lowering. All of
these are implemented the same way they are for the compute constructs,
so this is a pretty mild amount of changes.
Avoid constructing invalid ConstantRange when Offset + Length in memset
overflows signed 64-bit integer space. This prevents assertion failures
when inferring the initializes attribute.
Fixes#140345
In addBranchWeightToMiddleTerminator we attempt to add branch weights to
the middle block terminator. We pessimistically assume vscale=1, whereas
we can improve the estimate by using the value of vscale used for
tuning.
This will convert loads of constant strings to immediate values. Put
this behind a flag that is enabled by default so that we can toggle it
if need be.
CTTZ/CTLZ_ZERO_UNDEF nodes can only create poison if the source value is zero - so check with isKnownNeverZero
Pulled out of #146361 and reapplied now that #146490 has landed.
Following on from #118638, this handles widened induction variables with
EVL tail folding by setting the VF operand to be EVL, calculated in the
vector body.
We need to do this for correctness since with EVL tail folding the
number of elements processed in the penultimate iteration may not be VF,
but the runtime EVL, and we need take this into account when updating
the backedge value.
- Because the VF may now not be a live-in we need to move the insertion
point to just after the VFs definition
- We also need to avoid truncating it when it's the same size as the
step type, previously this wasn't a problem for live-ins.
- Also because the VF may be smaller than the IV type, since the EVL is
always i32, we may need to zext it.
On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and
42.8% more loops vectorized on SPEC CPU 2017
When declaring multiple arrays of 1 ExaByte in a struct, the offset can
exceed 2EB, causing incorrect struct size reporting (only 1EB). This fix
ensures an error is thrown, preventing the generation of incorrect
assembly. #60272