As a preliminary to making DIL the default implementation for
'frame var', ran check-lldb forcing 'frame var' to always use DIL,
and discovered a few failing tests. This fixes most of them. The only
remaining failing test is TestDAP_evaluate.py, which now passes
a test case that the test says should fail (still investigating this).
Changes in this PR:
- Sets correct VariableSP, as well as returning ValueObjectSP (needed
for several watchpoint tests).
- Updates error messages, when looking up members, to match what the
rest of LLDB expects. Also update appropriate DIL tests to expect the
updated error messages.
- Updates DIL parser to look for and accept "(anonymous namespace)::" at
the front of a variable name.
This PR adds a mechanism, so that downstream consumers can pass in
control functions for the application of these patterns. This change
shouldn't affect any consumers of this method that do not specify a
controlFn. The controlFn always gets the source operand of the consumer
in each of the patterns as a parameter.
In IREE, we (will) use it to control preventing folding patterns that
would inhibit fusion. See IREE issue
[#20896](https://github.com/iree-org/iree/issues/20896) for more
details.
Changes from Commit 40aab0412f "[test]
Migrate -gcc-toolchain with space separator to --gcc-toolchain=" made
two previously different RUN lines equal.
Remove one RUN line.
I found this peculiar comment in EarlyCSE:
1c78d8d9d7/llvm/lib/Transforms/Scalar/EarlyCSE.cpp (L1620-L1624)
Looking back over history, this seems to be referring to the
aarch64.neon.stN intrinsics, which are indeed not marked writeonly
(though the ldN intrinsics are readonly).
Possibly I'm missing something special about these intrinsics, but I
think it is safe to mark them as writeonly.
In both `bubbleUpPackOpThroughGenericOp()` or
`pushDownUnPackOpThroughGenericOp()`, we can simplify the lowered IR by
removing the pack of an empty when the init tensor isn't used in generic
op. Instead of packing an empty tensor, the empty tensor can be
forwarded to the generic output. This allows cleaner result after data
layout propagation.
This pass reifies the shapes of a subset of
`ReifyRankedShapedTypeOpInterface` ops with `tensor` results.
The pass currently only supports result shape type reification for:
- tensor::PadOp
- tensor::ConcatOp
It addresses a representation gap where implicit op semantics are needed
to infer static result types from dynamic
operands. But it does so by using `ReifyRankedShapedTypeOpInterface` as
the source of truth rather than the op itself.
As a consequence, this cannot generalize today.
TODO: in the future, we should consider coupling this information with
op "transfer functions" (e.g.
`IndexingMapOpInterface`) to provide a source of truth that can work
across result shape inference, canonicalization and
op verifiers.
The pass replaces the operations with their reified versions, when more
static information can be derived, and inserts
casts when results shapes are updated.
Example:
```mlir
#map = affine_map<(d0) -> (-d0 + 256)>
func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
%0 = affine.apply #map(%arg1)
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
%padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index):
tensor.yield %arg0 : f32
} : tensor<1x?x64xf32> to tensor<1x?x64xf32>
return %padded : tensor<1x?x64xf32>
}
// mlir-opt --reify-result-shapes
#map = affine_map<()[s0] -> (-s0 + 256)>
func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
%0 = affine.apply #map()[%arg1]
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
%padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index):
tensor.yield %arg0 : f32
} : tensor<1x?x64xf32> to tensor<1x256x64xf32>
%cast = tensor.cast %padded : tensor<1x256x64xf32> to tensor<1x?x64xf32>
return %cast : tensor<1x?x64xf32>
}
```
---------
Co-authored-by: Fabian Mora <fabian.mora-cordero@amd.com>
Fix a couple of unhandled edge cases in offload-tblgen that were found
by static analysis
* `LineStart` may wrap around to 0 when processing multi-line strings.
The value is not actually being used in that case, but still better to
explicitly handle it
* Possible unchecked nullptr when processing parameter flags
This PR introduces support for the DWARF64 format, enabling handling of
64-bit DWARF sections as defined by the DWARF specification. The update
includes adjustments to header parsing and modification of form values
to accommodate 64-bit offsets and values.
Also Added the testcase to verify the DWARF64 format.
This implements the async, wait, if, and if_present (as well as
device_type, but that is a detail of async/wait) lowering. All of
these are implemented the same way they are for the compute constructs,
so this is a pretty mild amount of changes.
Avoid constructing invalid ConstantRange when Offset + Length in memset
overflows signed 64-bit integer space. This prevents assertion failures
when inferring the initializes attribute.
Fixes#140345
In addBranchWeightToMiddleTerminator we attempt to add branch weights to
the middle block terminator. We pessimistically assume vscale=1, whereas
we can improve the estimate by using the value of vscale used for
tuning.
This will convert loads of constant strings to immediate values. Put
this behind a flag that is enabled by default so that we can toggle it
if need be.
CTTZ/CTLZ_ZERO_UNDEF nodes can only create poison if the source value is zero - so check with isKnownNeverZero
Pulled out of #146361 and reapplied now that #146490 has landed.
Following on from #118638, this handles widened induction variables with
EVL tail folding by setting the VF operand to be EVL, calculated in the
vector body.
We need to do this for correctness since with EVL tail folding the
number of elements processed in the penultimate iteration may not be VF,
but the runtime EVL, and we need take this into account when updating
the backedge value.
- Because the VF may now not be a live-in we need to move the insertion
point to just after the VFs definition
- We also need to avoid truncating it when it's the same size as the
step type, previously this wasn't a problem for live-ins.
- Also because the VF may be smaller than the IV type, since the EVL is
always i32, we may need to zext it.
On -march=rva23u64 -O3 we get 87.1% more loops vectorized on TSVC, and
42.8% more loops vectorized on SPEC CPU 2017
When declaring multiple arrays of 1 ExaByte in a struct, the offset can
exceed 2EB, causing incorrect struct size reporting (only 1EB). This fix
ensures an error is thrown, preventing the generation of incorrect
assembly. #60272
This keeps getting forgotten (e.g. #66603) - so make a point of adding
it here to make it clear instead of relying on the implicit default of
returning true.
Previously, references to regions and successors were incorrectly disallowed outside the top-level assembly form. This change enables the use of bound regions and successors as variables in custom directives.
Although nice to have to prove the freeze can be moved, this can fail
immediately after freeze(op(...)) -> op(freeze(),freeze(),...) creation
if any of the new freeze nodes now prevents value tracking from seeing
through to the source values (e.g. shift amounts/element indices are in
bounds etc.).
This will allow us to remove the isGuaranteedNotToBeUndefOrPoison checks
inside canCreateUndefOrPoison that were discussed on #146361
Firstly, this commit requires that all types are signless in the strict
mode of the validation pass. This is because signless types on
operations are required by the TOSA specification. The "strict" mode in
the validation pass is the final check for TOSA conformance to the
specification, which can often be used for conversion to other formats.
In addition, a conversion pass `--tosa-convert-integer-type-to-signless`
is provided to allow a user to convert all integer types to signless.
The intention is that this pass can be run before the validation pass.
Following use of this pass, input/output information should be carried
independently by the user.
When compiling with `-march=armv9-a+nosve` we found that Clang still
defines the `__ARM_FEATURE_SVE2` macro, which is explicitly set in
`setArchFeatures` when compiling for armv9-a.
After some experimenting, I found out that the list of features passed
into `AArch64TargetInfo::handleTargetFeatures` has already been expanded
and takes into account `+no[feature]` and has already expanded features
like `armv9-a`.
From that I conclude that `setArchFeatures` is no longer required.