Summary:
This flag is pretty canonical in ELF linkers, it allows us to force the
link job to extract a library if it defines a specific symbol. This is
mostly useful for letting us forcibly extract things that don't fit the
normal model (i.e. kernels) from static libraries.
This is to work around the fact that
`SymbolFileNativePDB::FindFunctions` only support
`lldb::eFunctionNameTypeFull` and `lldb::eFunctionNameTypeMethod` now.
Since `main`'s full name is the same as base name (`main`), it's okay to
search with `lldb::eFunctionNameTypeFull` when trying to get the default
file and line. With this, `lldb/test/Shell/Driver/TestSingleQuote.test`
passes on Windows with NativePDB plugin.
Summary:
This should be the linker's job if the user creates any bitcode files,
then passing `-flto` to the linker for the toolchain should be able to
handle it. Right now this path is only used in the case where someone
does LTO w/ ld.gold targeting a CPU so I think we are safe here as that
will still be forwarded, for bfd it'll be an error as it would on the
host. I think I talked the SYCL team out of using this as well so I
should be good to delete it.
Fix#113652.
When calling `Node.isAggregate()` and `Node.isPOD()`, if `Node` is declared but
not defined, it will result in null pointer dereference (and if assertions are
enabled, it will cause an assertion failure).
The x86_64_lam_qemu buildbots started failing
(https://lab.llvm.org/buildbot/#/builders/139/builds/5462/steps/2/logs/stdio).
Based on the logs, it appears the HWASan check is correct but it did not
match the stderr/stdout output. This patch attempts to fix the issue by
flushing stderr/stdout as appropriate.
More context can be found in
https://github.com/llvm/llvm-project/pull/110303
For DAP tests running in constrained environments (e.g., Docker
containers), disabling ASLR isn't allowed. So we set `disableASLR=False`
(since https://github.com/llvm/llvm-project/pull/113593).
However, the `dap_server.py` will currently only forward the value
of `disableASLR` to the DAP executable if it's set to `True`. If the
DAP executable wasn't provided a `disableASLR` field it defaults to
`true` too:
f147437945/lldb/tools/lldb-dap/lldb-dap.cpp (L2103-L2104)
This means that passing `disableASLR=False` from the tests is currently
not possible.
This is also true for many of the other boolean arguments of
`request_launch`. But this patch only addresses `disableASLR` for now
since it's blocking a libc++ patch.
When `FileAction` opens file with write access, it doesn't clear the
file nor append to the end of the file if it already exists. Instead, it
writes from cursor index 0.
For example, by using the settings `target.output-path` and
`target.error-path`, lldb will redirect process stdout/stderr to files.
It then calls this function to write to the files which the above
symptoms appear.
## Test
- Added unit test checking the file flags
- Added 2 api tests checking
- File content overwritten if the file path already exists
- Stdout and stderr redirection to the same file doesn't change its
behavior
Code cleanups for TableGen files, changes includes function names,
variable names and unused imports.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
This patch fixes:
mlir/lib/Pass/PassRegistry.cpp:425:37: error: ISO C++ requires the
name after '::~' to be found in the same scope as the name before
'::~' [-Werror,-Wdtor-name]
_TIED and _MASK_TIED pseudos have one less operand compared to other
pseudos, thus we shouldn't attach the same number of SchedRead for these
instructions.
I don't think we have a way to (explicitly) check scheduling classes. So
I only test this patch with existing tests.
add a unique and a sort option to the update_mc_test_check script.
These mc asm/dasm files are usually large in number of lines, and these
lines are mostly similar to each other. These options can be useful when
maintainer is merging or resolving conflicts by making the file
identifical
Also fixed a small issue in asm/dasm such that the auto generated header
line is
1. asm using ";" instead of "//" as comment marker
2. dasm using ";" instead of "#" as comment marker
Vassil has significant experience helping users with the plugin
interface in Clang, especially around the new efforts to bring plugin
support to Windows. He also is knowledgeable about modules support.
Currently, we store injected template arguments in
`RedeclarableTemplateDecl::CommonBase`. This approach has a couple
problems:
1. We can only access the injected template arguments of
`RedeclarableTemplateDecl` derived types, but other `Decl` kinds still
make use of the injected arguments (e.g.
`ClassTemplatePartialSpecializationDecl`,
`VarTemplatePartialSpecializationDecl`, and `TemplateTemplateParmDecl`).
2. Accessing the injected template arguments requires the common data
structure to be allocated. This may occur before we determine whether a
previous declaration exists (e.g. when comparing constraints), so if the
template _is_ a redeclaration, we end up discarding the common data
structure.
This patch moves the storage and access of injected template arguments
from `RedeclarableTemplateDecl` to `TemplateParameterList`.
Sirraide has been actively reviewing Sema code for a while now and
definitely has the expertise to help maintain that section of the
compiler. Further, he has been refactoring AST visitors to try to reduce
the compile time overhead associated with them and would be a good
resource for keeping an eye on that part of the code base too.
If we have a minimum vlen, we were adjusting StackSize to change the
unit from vscale to bytes, and then calculating the required padding
size for alignment in bytes. However, we then used that padding size as
an offset in vscale units, resulting in misplaced stack objects.
While it would be possible to adjust the object offsets by dividing
AlignmentPadding by ST.getRealMinVLen() / RISCV::RVVBitsPerBlock, we can
simplify the calculation a bit if instead we adjust the alignment to be
in vscale units.
@topperc This fixes a bug I am seeing after #110312, but I am not 100%
certain I am understanding the code correctly, could you please see if
this makes sense to you?
Implement TrieRawHashMap can be used to store object with its associated
hash. User needs to supply a strong hashing function to guarantee the
uniqueness of the hash of the objects to be inserted. A hash collision
is not supported and will lead to error or failed to insert.
TrieRawHashMap is thread-safe and lock-free and can be used as
foundation data structure to implement a content addressible storage.
TrieRawHashMap owns the data stored in it and is designed to be:
* Fast to lookup.
* Fast to "insert" if the data has already been inserted.
* Can be used without lock and doesn't require any knowledge of the
participating threads or extra coordination between threads.
It is not currently designed to be used to insert unique new data with
high contention, due to the limitation on the memory allocator.
Fix bug that `mir-strip-debug` pass does not remove debug location from
bundled instructions.
Problem arises during testing that debug info does not affect
optimization passes output (`llvm-lit` with ` -Dllc="llc
-debugify-and-strip-all-safe"`), when pass operates on MIR with bundled
instructions + memory operands.
Let mir test check looks like:
```
CHECK-NEXT: BUNDLE {
CHECK-NEXT: $r3 = LD $r1, $r2 :: (load (s64) from %ir.a, !tbaa !2)
CHECK-NEXT: }
```
So as `mir-strip-debug` pass does not process bundled instructions,
running `llc -debugify-and-strip-all-safe` on the test will produce the
following output:
```
BUNDLE {
$r3 = LD $r1, $r2, debug-location !DILocation(line: 3, column: 1, scope: <0x608cb2b99b10>) :: (load (s64) from %ir.a, !tbaa !2)
}
```
And test will fail, but it shouldn't.
Seems like the root cause is that `mir-strip-debug` pass should remove
debug location from bundled instructions.
Summary:
We have the ability to schedule callbacks after certain events complete.
Currently we can register an arbitrary callback in CUDA, but can't in
AMDGPU. I am planning on using this support to move the RPC handling to
a separate thread, then using these callbacks to suspend / resume it
when no kernels are running. This is a preliminary patch to keep this
noise out of that one.
fixes#112974
partially fixes#70103
### Changes
- Added new tablegen based way of lowering dx intrinsics to DXIL ops.
- Added int_dx_group_memory_barrier_with_group_sync intrinsic in
IntrinsicsDirectX.td
- Added expansion for int_dx_group_memory_barrier_with_group_sync in
DXILIntrinsicExpansion.cpp`
- Added DXIL backend test case
### Related PRs
* [[clang][HLSL] Add GroupMemoryBarrierWithGroupSync intrinsic
#111883](https://github.com/llvm/llvm-project/pull/111883)
* [[SPIRV] Add GroupMemoryBarrierWithGroupSync intrinsic
#111888](https://github.com/llvm/llvm-project/pull/111888)
Shafik and Vlad are both members of WG21 and both have familiarity with
reasoning about the C++ standard. They've both volunteered to help
answer conformance related questions, and this is an area where we get
quite a bit of questions so having a larger stable of maintainers is
quite useful.
This patch updates the translation of `omp.wsloop` with a nested
`omp.simd` to prevent uses of block arguments defined by the latter from
triggering null pointer dereferences.
This happens because the inner `omp.simd` operation representing
composite `do simd` constructs is currently skipped and not translated,
but this results in block arguments defined by it not being mapped to an
LLVM value. The proposed solution is to map these block arguments to the
LLVM value associated to the corresponding operand, which is defined
above.
Kernel launch in CUF are converted to `gpu.launch_func`. When the kernel
has `cluster_dims` specified these get carried over to the
`gpu.launch_func` operation. This patch updates the special conversion
of `gpu.launch_func` when cluster dims are present to the newly added
entry point.
At the moment, `GenericPadOpVectorizationPattern` implements two
orthogonal transformations:
1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`,
`linalg::FillOp` and `tensor::InsertSliceOp`.
2. Vectorizes (where possible) `tensor::InsertSliceOp` (see
`tryVectorizeCopy`).
This patch splits `GenericPadOpVectorizationPattern` into two separate
patterns:
1. `GeneralizePadOpPattern` for the first transformation (note that
currently `GenericPadOpVectorizationPattern` inherits from
`GeneralizePadOpPattern`).
2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`.
With this change, we gain the following:
* a clear separation between pre-processing and vectorization
transformations/stages,
* a path to support masked vectorisation for `tensor.insert_slice`
(with a dedicated pattern for vectorization, it is much easier to
specify the input vector sizes used in masking),
* more opportunities to vectorize `tensor.insert_slice`.
Note for downstream users:
--------------------------
If you were using `populatePadOpVectorizationPatterns`, following this
change you will also have to add
`populateInsertSliceVectorizationPatterns`.
Finer implementation details:
-----------------------------
1. The majority of changes in this patch are copy & paste + some edits.
1.1. The only functional change is that the vectorization of
`tensor.insert_slice` is now broadly available (as opposed to being
constrained to the pad vectorization pattern:
`GenericPadOpVectorizationPattern`).
1.2. Following-on from the above, `@pad_and_insert_slice_dest` is
updated. As expected, the input `tensor.insert_slice` Op is no
longer "preserved" and instead gets vectorized successfully.
2. The `linalg.fill` case in `getConstantPadVal` works under the
assumption that only _scalar_ source values can be used. That's
consistent with the definition of the Op, but it's not tested at the
moment. Hence a test case in Linalg/invalid.mlir is added.
3. The behaviour of the two TD vectorization Ops,
`transform.structured.vectorize_children_and_apply_patterns` and
`transform.structured.vectorize` is preserved.