This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the uses of an instruction are replaced before replacing
the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
This reland PR includes a fix for an assertion failure caused by
https://github.com/llvm/llvm-project/pull/137215, which was reverted.
The failing test involved a phi and gep depending on each other, in
which case the PtrReplacer did not order them correctly for replacement.
This patch fixes it by adding a check during the definition of
`PostOrderWorklist`.
This pr extends `dumpRootElements` to invoke the print methods of all
`RootElement`s now that they are all implemented.
Extends the `RootSignatures-AST.hlsl` testcase to have a root element of
each type being parsed, constructed to the in-memory representation mode
and then being dumped as part of the AST dump.
- Update `HLSLRootSignatureUtils.cpp` to extend `dumpRootElements`
- Extend `AST/HLSL/RootSigantures-AST.hlsl` testcase
- Defines the helper `operator<<` for `RootElement`
- Small correction to the output of `numDescriptors` to be `unbounded`
in special case
Resolves https://github.com/llvm/llvm-project/issues/124595.
This PR fixes broken links in all files describing libc usage modes.
Please let me know if there are any other places that need updating.
---------
Co-authored-by: shubhp@perlmutter <shubhp@perlmutter.com>
In the script that's used by RPC to convert LLDB headers to LLDB RPC
headers, there's a bug with how it converts namespace usage. An
overeager regex pattern caused *all* text before any `lldb::` namespace
usage to get replaced with `lldb_rpc::` instead of just the namespace
itself. This commit changes that regex pattern to be less overeager and
modifies one of the shell tests for this script to actually check that
the namespace usage replacement is working correctly.
rdar://154126268
Running the `vector-combine` pass on this test now produces a single
shuffle on a loaded `<1 x float>` instead of an insert into a `<2 x
float>` followed by a shuffle.
This test change matches changes in other tests in PR #144690, which
introduced the optimization.
If we simplify a udiv/sdiv using the exact flag we shouldn't
propagate that simplifaction to any urem/srem that happens to
use the same operands. If the exact flag is wrong, the udiv/sdiv
will produce poison, but that doesn't mean we can make the urem/srem
simplify to 0.
Fixes#145360.
This patch fixes:
third-party/unittest/googletest/include/gtest/gtest.h:1379:11:
error: comparison of integers of different signs: 'const unsigned
long' and 'const int' [-Werror,-Wsign-compare]
Reverts llvm/llvm-project#143739 because it triggers an assert:
```
Assertion failed: (!isNull() && "Cannot retrieve a NULL type pointer"), function getCommonPtr, file Type.h, line 952.
```
This mirrors incubator changes from https://github.com/llvm/clangir/pull/1678
- Create CIR specific EnumAttr bases and prefix enum attributes with CIR_ that automatically puts enum to cir namespace
- Removes unnecessary enum case definitions
- Unifies naming of enum values to use capitals consistently and make enumerations to start from 0
Following up from https://github.com/llvm/llvm-project/pull/143467,
this PR adds support for
`ReductionTilingStrategy::PartialReductionOuterParallel` to
`tileUsingSCF`. The implementation of
`PartialReductionTilingInterface` for `Linalg` ops has been updated to
support this strategy as well. This makes the `tileUsingSCF` come on
par with `linalg::tileReductionUsingForall` which will be deprecated
subsequently.
Changes summary
- `PartialReductionTilingInterface` changes :
- `tileToPartialReduction` method needed to get the induction
variables of the generated tile loops. This was needed to keep the
generated code similar to `linalg::tileReductionUsingForall`,
specifically to create a simplified access for slicing the
intermediate partial results tensor when tiled in `num_threads` mode.
- `getPartialResultTilePosition` methods needs the induction
varialbes for the generated tile loops for the same reason above,
and also needs the `tilingStrategy` to be passed in to generate
correct code.
The tests in `transform-tile-reduction.mlir` testing the
`linalg::tileReductionUsingForall` have been moved over to test
`scf::tileUsingSCF` with
`ReductionTilingStrategy::PartialReductionOuterParallel`
strategy. Some of the test that were doing further cyclic distribution
of the transformed code from tiling are removed. Those seem like two
separate transformation that were merged into one. Ideally that would
need to happen when resolving the `scf.forall` rather than during
tiling.
Please review only the top commit. Depends on
https://github.com/llvm/llvm-project/pull/143467
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
LLVM prevents the sm_32_intrinsics.hpp header from being included with a
#define SM_32_INTRINSICS_HPP. It also provides drop-in replacements of
the functions defined in the CUDA header.
One issue is that some intrinsics were added after the replacement was
written, and thus have no replacement, breaking code that calls them
(Raft is one example).
This commit backport the code from sm_32_intrinsics.hpp for the missing
intrinsics.
This is the second try after PR #143664 broke tests.
Evaluating AR at the symbolic max BTC may wrap and create an expression
that is less than the start of the AddRec due to wrapping (for example
consider MaxBTC = -2).
If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get
incremented by EltSize before returning, so this effectively sets ScEnd
to unsigned max. Note that LAA separately checks that accesses cannot
not wrap (52ded67249,
https://github.com/llvm/llvm-project/pull/127543), so unsigned max
represents an upper bound.
When there is a computable backedge-taken count, we are guaranteed to
execute the number of iterations, and if any pointer would wrap it would
be UB (or the access will never be executed, so cannot alias). It
includes new tests from the previous discussion that show a case we wrap
with a BTC, but it is UB due to the pointer after the object wrapping
(in `evaluate-at-backedge-taken-count-wrapping.ll`)
When we have only a maximum backedge taken count, we instead try to use
dereferenceability information to determine if the pointer access must be in
bounds for the maximum backedge taken count.
PR: https://github.com/llvm/llvm-project/pull/128061
Change loop variable 'i' to 'I' to conform to LLVM coding standards.
Variable names should start with an upper case letter according to LLVM
coding guidelines.
Fixed locations:
Lines 103-104: for loop variable and usage
Lines 257-258: for loop variable and usage
Co-authored-by: Shuqi Liang <Shuqi.Liang@ibm.com>
## Summary
A new `totalLoadedDwoFileCount` and `totalDwoFileCount` counters to
available statisctics when calling "statistics dump".
1. `GetDwoFileCounts ` is created, and returns a pair of ints
representing the number of loaded DWO files and the total number of DWO
files, respectively. An override is implemented for `SymbolFileDWARF`
that loops through each compile unit, and adds to a counter if it's a
DWO unit, and then uses `GetDwoSymbolFile(false)` to check whether the
DWO file was already loaded/parsed.
3. In `Statistics`, use `GetSeparateDebugInfo` to sum up the total
number of loaded/parsed DWO files along with the total number of DWO
files. This is done by checking whether the DWO file was already
successfully `loaded` in the collected DWO data, anding adding to the
`totalLoadedDwoFileCount`, and adding to `totalDwoFileCount` for all CU
units.
## Expected Behavior
- When binaries are compiled with split-dwarf and separate DWO files,
`totalLoadedDwoFileCount` would be the number of loaded DWO files and
`totalDwoFileCount` would be the total count of DWO files.
- When using a DWP file instead of separate DWO files,
`totalLoadedDwoFileCount` would be the number of parsed compile units,
while `totalDwoFileCount` would be the total number of CUs in the DWP
file. This should be similar to the counts we get from loading separate
DWO files rather than only counting whether a single DWP file was
loaded.
- When not using split-dwarf, we expect both `totalDwoFileCount` and
`totalLoadedDwoFileCount` to be 0 since no separate debug info is
loaded.
## Testing
**Manual Testing**
On an internal script that has many DWO files, `statistics dump` was
called before and after a `type lookup` command. The
`totalLoadedDwoFileCount` increased as expected after the `type lookup`.
```
(lldb) statistics dump
{
...
"totalLoadedDwoFileCount": 29,
}
(lldb) type lookup folly::Optional<unsigned int>::Storage
typedef std::conditional<true, folly::Optional<unsigned int>::StorageTriviallyDestructible, folly::Optional<unsigned int>::StorageNonTriviallyDestructible>::type
typedef std::conditional<true, folly::Optional<unsigned int>::StorageTriviallyDestructible, folly::Optional<unsigned int>::StorageNonTriviallyDestructible>::type
...
(lldb) statistics dump
{
...
"totalLoadedDwoFileCount": 2160,
}
```
**Unit test**
Added three unit tests that build with new "third.cpp" and "baz.cpp"
files. For tests with w/ flags `-gsplit-dwarf -gpubnames`, this
generates 2 DWO files. Then, the test incrementally adds breakpoints,
and does a type lookup, and the count should increase for each of these
as new DWO files get loaded to support these.
```
$ bin/lldb-dotest -p TestStats.py ~/llvm-sand/external/llvm-project/lldb/test/API/commands/statistics/basic/
----------------------------------------------------------------------
Ran 20 tests in 211.738s
OK (skipped=3)
```
This is a precursor to generalizing the `tileUsingSCF` to handle
`ReductionTilingStrategy::PartialOuterParallel` strategy. This change
itself is generalizing/refactoring the current implementation that
supports only `ReductionTilingStrategy::PartialOuterReduction`.
Changes in this PR
- Move the `ReductionTilingStrategy` enum out of
`scf::SCFTilingOptions` and make them visible to `TilingInterface`.
- `PartialTilingInterface` changes
- Pass the `tilingStrategy` used for partial reduction to
`tileToPartialReduction`.
- Pass the reduction dimension along as `const
llvm::SetVector<unsigned> &`.
- Allow `scf::SCFTilingOptions` to set the reduction dimensions that
are to be tiled.
- Change `structured.tiled_reduction_using_for` to allow specification
of the reduction dimensions to be partially tiled.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
## Problem
When using `-fsave-optimization-record` with offloading, the Clang
driver passes optimization record options like
`-plugin-opt=opt-remarks-format=yaml` to `clang-nvlink-wrapper`.
However, the wrapper doesn't recognize these options, causing
compilation to fail.
## Solution
This patch adds support for the standard optimization record command
line options to `clang-nvlink-wrapper`, matching the interface provided
by LLD and gold-plugin as documented in the [LLVM Remarks
documentation](https://llvm.org/docs/Remarks.html).
## Changes
- **NVLinkOpts.td**: Added definitions for `opt-remarks-filename`,
`opt-remarks-format`, `opt-remarks-filter`, and
`opt-remarks-with-hotness` options
- **NVLinkOpts.td**: Added `plugin-opt=` aliases for these options to
match what the Clang driver sends
- **ClangNVLinkWrapper.cpp**: Updated `createLTO()` to use command line
arguments when available, falling back to existing global variables
## Testing
The fix allows `-fsave-optimization-record` to work correctly with
offloading, generating optimization records during the LTO phase without
throwing unknown argument errors.
This change maintains backward compatibility and follows the existing
pattern used by other LLVM linkers.
This matches the current binutils behaviour, and the orignal ratified
spec that states that LMUL=1 when the lmul operand is omitted. A variety
of previously invalid vtype instructions are now accepted by the
assembler.
To match binutils, at least one of the vtype operands must be provided.
Also fixes the MCOperandPredicate definition in VTypeIOp, which had one
logic issue and one syntax issue. This is not used by the MC layer
currently, so this change should not affect the existing implementation.
Fixes#143744
As of 20b5728b7b, C always enables Zca, so
the check `C || Zca` is equivalent to just checking for `Zca`.
This replaces any uses of `HasStdExtCOrZca` with a new `HasStdExtZca`
(with the same assembler description, to avoid changes in error
messages), and simplifies everywhere where C++ needed to check for
either C or Zca.
The Subtarget function is just deprecated for the moment.
Currently, the query assumes that a single undef byte implies the rest of
the `EltSize - 1` bytes are undefs, but that's not always true.
e.g. isSplatShuffleMask(
<0,1,2,3,4,5,6,7,undef,undef,undef,undef,0,1,2,3>, 8) should return
false.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
Calling `DeclMustBeEmitted` should not lead to more deserialization, as
it may occur before previous deserialization has finished.
When passed a `VarDecl` with an initializer however, `DeclMustBeEmitted`
needs to know whether that initializer contains side effects. When the
`VarDecl` is deserialized but the initializer is not, this triggers
deserialization of the initializer. To avoid this we add a bit to the
serialization format for `VarDecl`s, indicating whether its initializer
contains side effects or not, so that the `ASTReader` can query this
information directly without deserializing the initializer.
rdar://153085264