Commit Graph

493 Commits

Author SHA1 Message Date
Anastasia Stulova
79b222c39f [OpenCL] Fix types with signed prefix in arginfo metadata.
Signed prefix is removed and the single word spelling is
printed for the scalar types.

Tags: #clang

Differential Revision: https://reviews.llvm.org/D96161
2021-02-09 15:13:19 +00:00
Anastasia Stulova
ecc8ac3f08 [OpenCL] Fix pipe type printing in arg info metadata
Pipe element type spelling for arg info metadata
should follow the same behavior as normal type spelling.

We should only use the canonical type spelling in the
base type field.

This patch also removed duplication in type handling.

Tags: #clang

Differential Revision: https://reviews.llvm.org/D96151
2021-02-08 16:05:13 +00:00
Stanislav Mekhanoshin
8e661d3d9c [AMDGPU] Set s-memtime-inst feature from clang
Differential Revision: https://reviews.llvm.org/D95733
2021-02-01 14:20:43 -08:00
Sven van Haastregt
526c42e76c [OpenCL] Hide sampler-less read_image builtins before CL1.2
Ensure sampler-less image read functions are not available with
`-fdeclare-opencl-builtins` before OpenCL 1.2.
2021-01-28 11:14:19 +00:00
Sven van Haastregt
79c727328b [clang] Fix signedness in vector bitcast evaluation
The included test case triggered a sign assertion on the result in
`Success()`.  This was caused by the APSInt created for a bitcast
having its signedness bit inverted.  The second APSInt constructor
argument is `isUnsigned`, so invert the result of
`isSignedIntegerType`.

Relanding this patch after reverting.  The test case had to be updated
to be insensitive to 32/64-bit extractelement indices.

Differential Revision: https://reviews.llvm.org/D95135
2021-01-27 09:30:26 +00:00
Sven van Haastregt
b16fb1ffc3 Revert "[clang] Fix signedness in vector bitcast evaluation"
This reverts commit 14947cd047 because
it broke clang-cmake-armv7-quick.
2021-01-25 12:43:30 +00:00
Sven van Haastregt
14947cd047 [clang] Fix signedness in vector bitcast evaluation
The included test case triggered a sign assertion on the result in
`Success()`.  This was caused by the APSInt created for a bitcast
having its signedness bit inverted.  The second APSInt constructor
argument is `isUnsigned`, so invert the result of
`isSignedIntegerType`.

Differential Revision: https://reviews.llvm.org/D95135
2021-01-25 12:01:42 +00:00
Nikita Popov
65fd034b95 [FunctionAttrs] Infer willreturn for functions without loops
If a function doesn't contain loops and does not call non-willreturn
functions, then it is willreturn. Loops are detected by checking
for backedges in the function. We don't attempt to handle finite
loops at this point.

Differential Revision: https://reviews.llvm.org/D94633
2021-01-21 20:29:33 +01:00
Sven van Haastregt
29d375f5ff [OpenCL][NFC] Improve OpenCL test file naming
Change "negative" into "invalid" and put "invalid" at the beginning of
the file name, following the bulk of the invalid tests in the
SemaOpenCL directory.

Use the "invalid-" prefix only for tests that contain only invalid
constructs.

Drop the "valid" suffix for CodeGen tests, as inputs in this directory
are supposed to be valid anyway.
2021-01-06 14:16:44 +00:00
Fangrui Song
fd739804e0 [test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.

To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.

To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
2020-12-31 00:27:11 -08:00
Fangrui Song
6b3351792c [test] Add {{.*}} to make tests immune to dso_local/dso_preemptable/(none) differences
For a definition (of most linkage types), dso_local is set for ELF -fno-pic/-fpie
and COFF, but not for Mach-O.  This nuance causes unneeded binary format differences.

This patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `
if there is an explicit linkage.

* Clang will set dso_local for Mach-O, which is currently implied by TargetMachine.cpp. This will make COFF/Mach-O and executable ELF similar.
* Eventually I hope we can make dso_local the textual LLVM IR default (write explicit "dso_preemptable" when applicable) and -fpic ELF will be similar to everything else. This patch helps move toward that goal.
2020-12-30 20:52:01 -08:00
Juneyoung Lee
9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Juneyoung Lee
278aa65cc4 [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93793
2020-12-30 04:21:04 +09:00
Tony
92ab6ed667 [AMDGPU] Add missing targets to amdgpu-features.cl
Differential Revision: https://reviews.llvm.org/D93017
2020-12-12 18:19:02 +00:00
Melanie Blower
320af6b138 Create SPIRABIInfo to enable SPIR_FUNC calling convention.
Background: Call to library arithmetic functions for div is emitted by the
compiler and it set wrong “C” calling convention for calls to these functions,
whereas library functions are declared with `spir_function` calling convention.
InstCombine optimization replaces such calls with “unreachable” instruction.
It looks like clang lacks SPIRABIInfo class which should specify default
calling conventions for “system” function calls. SPIR supports only
SPIR_FUNC and SPIR_KERNEL calling convention.

Reviewers: Erich Keane, Anastasia

Differential Revision: https://reviews.llvm.org/D92721
2020-12-12 05:48:20 -08:00
Yaxun (Sam) Liu
efc063b621 Fix lit test failure due to 0b81d9
These lit tests now requires amdgpu-registered-target since they
use clang driver and clang driver passes an LLVM option which
is available only if amdgpu target is registered.

Change-Id: I2df31967409f1627fc6d342d1ab5cc8aa17c9c0c
2020-12-07 19:50:21 -05:00
Alex Richardson
51e09e1d5a [AMDGPU] Set the default globals address space to 1
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D84345
2020-11-20 15:46:53 +00:00
Simon Pilgrim
c1e3d38301 [CodeGenOpenCL] Fix check prefix typo on convergent.cl test
Noticed while fixing unused prefix warnings - there isn't actually any diff in the loop unrolled ir between old/new pass managers any more, so the broken checks were superfluous
2020-11-11 15:44:59 +00:00
Tim Renouf
89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf
ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Jon Chesterfield
dee7704829 [AMDGPU] Add __builtin_amdgcn_grid_size
[AMDGPU] Add __builtin_amdgcn_grid_size

Similar to D76772, loads the data from the dispatch pointer. Marked invariant.

Patch also updates the openmp devicertl to use this builtin.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D90251
2020-10-29 16:25:13 +00:00
Tony
5984097823 [AMDGPU] Add missing support for targets
- Add missing tests.

Differential Revision: https://reviews.llvm.org/D90212
2020-10-27 15:36:31 +00:00
Richard Smith
6781fee085 Don't permit array bound constant folding in OpenCL.
Permitting non-standards-driven "do the best you can" constant-folding
of array bounds is permitted solely as a GNU compatibility feature. We
should not be doing it in any language mode that is attempting to be
conforming.

From https://reviews.llvm.org/D20090 it appears the intent here was to
permit `__constant int` globals to be used in array bounds, but the
change in that patch only added half of the functionality necessary to
support that in the constant evaluator. This patch adds the other half
of the functionality and turns off constant folding for array bounds in
OpenCL.

I couldn't find any spec justification for accepting the kinds of cases
that D20090 accepts, so a reference to where in the OpenCL specification
this is permitted would be useful.

Note that this change also affects the code generation in one test:
because after 'const int n = 0' we now treat 'n' as a constant
expression with value 0, it's now a null pointer, so '(local int *)n'
forms a null pointer rather than a zero pointer.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D89520
2020-10-20 16:52:28 -07:00
Matt Arsenault
0a7cd99a70 Reapply "OpaquePtr: Add type to sret attribute"
This reverts commit eb9f7c28e5.

Previously this was incorrectly handling linking of the contained
type, so this merges the fixes from D88973.
2020-10-16 11:05:02 -04:00
Stanislav Mekhanoshin
d1beb95d12 [AMDGPU] gfx1032 target
Differential Revision: https://reviews.llvm.org/D89487
2020-10-15 12:41:18 -07:00
Tim Renouf
666ef0db20 [AMDGPU] Add gfx602, gfx705, gfx805 targets
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:

* The "Oland" and "Hainan" variants of gfx601 are now split out into
  gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
  that to avoid using the shaderZExport workaround on gfx602.

* One variant of gfx703 is now split out into gfx705. LLPC and other
  front-ends could use that to avoid using the
  shaderSpiCsRegAllocFragmentation workaround on gfx705.

* The "TongaPro" variant of gfx802 is now split out into gfx805.
  TongaPro has a faster 64-bit shift than its former friends in gfx802,
  and a subtarget feature could be set up for that to take advantage of
  it. This commit does not make that change; it just adds the target.

V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
    so fix the GPUKind order.

Differential Revision: https://reviews.llvm.org/D88916

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-10-10 17:22:22 +01:00
Michael Liao
8c36eaf037 [clang][opencl][codegen] Remove the insertion of correctly-rounded-divide-sqrt-fp-math fn-attr.
- `-cl-fp32-correctly-rounded-divide-sqrt` is already handled in a
  per-instruction manner by annotating the accuracy required. There's no
  need to add that fn-attr. So far, there's no in-tree backend handling
  that attr and that OpenCL specific option.
- In case that out-of-tree backends are broken, this change could be
  reverted if those backends could not be fixed.

Differential Revision: https://reviews.llvm.org/D88424
2020-10-01 11:07:39 -04:00
Tres Popp
eb9f7c28e5 Revert "OpaquePtr: Add type to sret attribute"
This reverts commit 55c4ff91bd.

Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
2020-09-29 10:31:04 +02:00
Matt Arsenault
55c4ff91bd OpaquePtr: Add type to sret attribute
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
2020-09-25 14:07:30 -04:00
Stanislav Mekhanoshin
59691dc874 [AMDGPU] Make ds fp atomics overloadable
Differential Revision: https://reviews.llvm.org/D87947
2020-09-23 11:39:50 -07:00
Matt Arsenault
30eeb742f1 clang: Use byref for aggregate kernel arguments
Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.
2020-08-06 15:52:26 -04:00
Stanislav Mekhanoshin
ea7d0e2996 [AMDGPU] gfx1031 target
Differential Revision: https://reviews.llvm.org/D85337
2020-08-05 12:36:26 -07:00
Alexey Bader
8d27be8dba [OpenCL] Add global_device and global_host address spaces
This patch introduces 2 new address spaces in OpenCL: global_device and global_host
which are a subset of a global address space, so the address space scheme will be
looking like:

```
generic->global->host
                          ->device
             ->private
             ->local
constant
```

Justification: USM allocations may be associated with both host and device memory. We
want to give users a way to tell the compiler the allocation type of a USM pointer for
optimization purposes. (Link to the Unified Shared Memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc)

Before this patch USM pointer could be only in opencl_global
address space, hence a device backend can't tell if a particular pointer
points to host or device memory. On FPGAs at least we can generate more
efficient hardware code if the user tells us where the pointer can point -
being able to distinguish between these types of pointers at compile time
allows us to instantiate simpler load-store units to perform memory
transactions.

Patch by Dmitry Sidorov.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D82174
2020-07-29 17:24:53 +03:00
Max Kazantsev
360ab70712 [SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls
We do not thread blocks with convergent calls, but this check was missing
when we decide to insert PR Phis into it (which we only do for threading).

Differential Revision: https://reviews.llvm.org/D83936
Reviewed By: nikic
2020-07-22 13:53:50 +07:00
Max Kazantsev
90798e09e2 Re-enable "[InstCombine] Simplify boolean Phis with const inputs using CFG"
This reverts commit b893822e32.

+ Clang test fixes
+ Insertion point fix for landing pads
2020-07-16 16:09:08 +07:00
Fangrui Song
b0b5162fc2 [Driver] Pass -gno-column-info instead of -dwarf-column-info
Making -g[no-]column-info opt out reduces the length of a typical CC1 command line.
Additionally, in a non-debug compile, we won't see -dwarf-column-info.
2020-07-05 11:50:38 -07:00
Roman Lebedev
7fed3cfadb [clang] Fix two tests that are affected by llvm opt change 2020-07-04 18:26:22 +03:00
Dmitry Preobrazhensky
53422e8b4f [AMDGPU] Added support of new inline assembler constraints
Added support for constraints 'I', 'J', 'L', 'B', 'C', 'Kf', 'DA', 'DB'.

See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D81657
2020-07-03 18:01:12 +03:00
Sven van Haastregt
bd46a56474 [OpenCL] Reject block arguments
OpenCL 2.0 does not allow block arguments, primarily because it is
difficult to support function pointers on the various architectures
that OpenCL targets.  Clang was still accepting them.

Rename and reuse the `err_opencl_half_param` diagnostic.

Fixes PR46324.

Differential Revision: https://reviews.llvm.org/D82313
2020-06-29 14:13:12 +01:00
Melanie Blower
f4aaed3bf1 Reland D81869 "Modify FPFeatures to use delta not absolute settings"
This reverts commit defd43a5b3.
with correction to solve msan report

To solve https://bugs.llvm.org/show_bug.cgi?id=46166 where the
floating point settings in PCH files aren't compatible, rewrite
FPFeatures to use a delta in the settings rather than absolute settings.
With this patch, these floating point options can be benign.

Reviewers: rjmccall

Differential Revision: https://reviews.llvm.org/D81869
2020-06-27 01:34:57 -07:00
Matt Arsenault
9e03bdebc1 AMDGPU: Add llvm.amdgcn.sqrt intrinsic
I spread the GlobalISel test into the regular one, which I've been
avoiding so far.
2020-06-26 15:07:07 -04:00
Melanie Blower
defd43a5b3 Revert "Revert "Revert "Modify FPFeatures to use delta not absolute settings"""
This reverts commit 9518763d71.
Memory sanitizer fails in CGFPOptionsRAII::CGFPOptionsRAII dtor
2020-06-26 08:47:04 -07:00
Melanie Blower
9518763d71 Revert "Revert "Modify FPFeatures to use delta not absolute settings""
This reverts commit b55d723ed6.
Reapply Modify FPFeatures to use delta not absolute settings

To solve https://bugs.llvm.org/show_bug.cgi?id=46166 where the
floating point settings in PCH files aren't compatible, rewrite
FPFeatures to use a delta in the settings rather than absolute settings.
With this patch, these floating point options can be benign.

Reviewers: rjmccall

Differential Revision: https://reviews.llvm.org/D81869
2020-06-26 08:00:08 -07:00
Melanie Blower
b55d723ed6 Revert "Modify FPFeatures to use delta not absolute settings"
This reverts commit 3a748cbf86.
I'm reverting this commit because I forgot to format the commit message
propertly. Sorry for the thrash.
2020-06-26 07:52:57 -07:00
Melanie Blower
3a748cbf86 Modify FPFeatures to use delta not absolute settings 2020-06-26 07:41:09 -07:00
Stanislav Mekhanoshin
9ee272f13d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Stanislav Mekhanoshin
58de24ce6c [AMDGPU] Sorted targets in amdgpu-features.cl. NFC. 2020-06-12 11:57:40 -07:00
Akira Hatanaka
c9a52de002 [CodeGen] Simplify the way lifetime of block captures is extended
Rather than pushing inactive cleanups for the block captures at the
entry of a full expression and activating them during the creation of
the block literal, just call pushLifetimeExtendedDestroy to ensure the
cleanups are popped at the end of the scope enclosing the block
expression.

rdar://problem/63996471

Differential Revision: https://reviews.llvm.org/D81624
2020-06-11 16:06:22 -07:00
Douglas Yung
086be9fb20 Fix test on PS4 linux bot.
Commit 301a6da8c2 changed the test and modified a CHECK
line that is inconsisent with similar lines elsewhere in the file and was causing failures
when run in slightly different configurations. This change makes the line more consistent
and should fix the bot failure.

Failure link: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/68593
2020-06-02 20:17:02 +00:00
Matt Arsenault
301a6da8c2 AMDGPU: Fix clang side null pointer value for private
The change to fold_priv_arith looks strange to me, but this was
already the untested behavior for local.
2020-06-02 09:23:46 -04:00