clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	925aa514ed	AMDGPU: Use DataExtractor for printf string extraction Attempt 2 to fix big endian bot failures.	2023-01-09 14:03:42 -05:00
Ivan Kosarev	2d945ef864	[AMDGPU][NFC] Rename GFX10A16 operands. They do not seem to be GFX10-specific anymore. Also renames the corresponding feature. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D141069	2023-01-09 17:18:46 +00:00
Jay Foad	6a6f62a719	[AMDGPU] S_MULK_I32 does not define SCC. NFCI. Differential Revision: https://reviews.llvm.org/D141281	2023-01-09 15:44:56 +00:00
Benjamin Kramer	b6942a2880	[NFC] Hide implementation details in anonymous namespaces	2023-01-08 17:37:02 +01:00
Matt Arsenault	270e96f435	Revert "AMDGPU: Invert handling of enqueued block detection" This reverts commit `47288cc977`. The runtime is having trouble with this at -O0 when the inputs are always enabled.	2023-01-07 21:48:07 -05:00
Matt Arsenault	68b6cabd9e	AMDGPU: Use getTypeAllocSize	2023-01-06 21:33:19 -05:00
Matt Arsenault	47554a0c73	AMDGPU: Use more accurate IR type for block handle The device library uses this as a struct with a pointer sized integer and 2 ints.	2023-01-06 21:23:28 -05:00
Matt Arsenault	47288cc977	AMDGPU: Invert handling of enqueued block detection Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).	2023-01-06 21:16:08 -05:00
Matt Arsenault	0416883dc1	AMDGPU: Fix enqueue block lowering for opaque pointers This was looking for a specific constant cast of the function, when the type doesn't matter. Doesn't bother trying to handle typed pointers, it will just assert. Things probably don't work completely correctly if the block kernel address is captured somewhere else, but that wouldn't work before either. The uses should really be loads out of the handle, and the handle initializer should contain the kernel address.	2023-01-06 21:15:39 -05:00
Matt Arsenault	0995a31e5d	AMDGPU: Try to fix 32-bit build bot	2023-01-06 17:34:25 -05:00
Matt Arsenault	40078a6b71	AMDGPU: Use BinaryByteStream in printf expansion Attempt to fix test failures on big endian bots. This pass definitely needs more test coverage.	2023-01-06 17:22:13 -05:00
Joe Nash	1b12d7d15b	[AMDGPU] Combine redundant Asm64 and AsmVOP3DPPBase. NFC Reduce duplication in the codebase by combining these fields in VOPProfile. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D141088	2023-01-06 14:09:42 -05:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00
Guillaume Chatelet	87b6b347fc	Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize" The patch should be discussed further. This reverts commit `dd56e1c92b`.	2023-01-06 15:27:50 +00:00
Guillaume Chatelet	dd56e1c92b	[NFC] Only expose getXXXSize functions in TypeSize Currently 'TypeSize' exposes two functions that serve the same purpose: - getFixedSize / getFixedValue - getKnownMinSize / getKnownMinValue source : `bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)` This patch offers to remove one of the two and stick to a single function in the code base. Differential Revision: https://reviews.llvm.org/D141134	2023-01-06 15:24:52 +00:00
Jay Foad	b7ef63af56	[AMDGPU] Add a feature for VALUTransUseHazard NFCI. This just allows us to experiment with enabling/disabling the workaround on different subtargets. Differential Revision: https://reviews.llvm.org/D141121	2023-01-06 13:50:17 +00:00
Juan Manuel MARTINEZ CAAMAÑO	543db09b97	[CodeGen][AMDGPU] EXTRACT_VECTOR_ELT: input vector element type can differ from output type In function SITargetLowering::performExtractVectorElt, the output type was not considered which could lead to type mismatches later. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139943	2023-01-06 09:46:02 +01:00
Vang Thao	25d72330ff	[AMDGPU] Add .uniform_work_group_size metadata to v5 Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires uniform work group size (i.e. each dimension of global size is a multiple of corresponding dimension of work group size). hipExtModuleLaunchKernel allows to launch HIP kernel with non-uniform workgroup size, which makes it necessary for runtime to check and enforce uniform workgroup size if kernel requires it. To let runtime be able to enforce that, this metadata is needed to indicate that the kernel requires uniform workgroup size. Reviewed By: kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D141012	2023-01-05 21:29:56 +00:00
Alexander Timofeev	6daa983c9d	[AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage Since the divergence-driven ISel was fully enabled we have more VGPRs available. MachineScheduler trying to take advantage of that bumps up the occupancy sacrificing the hiding of memory access latency. This really spoils the initially good schedule. A new metric that reflects the latency hiding quality of the schedule has been created to make it to balance between occupancy and latency. The metric is based on the latency model which computes the bubble to working cycles ratio. Then we use this ratio to decide if the higher occupancy schedule is profitable as follows: Profit = NewOccupancy/OldOccupancy * OldMetric/NewMetric Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D139710	2023-01-05 21:10:56 +01:00
Matt Arsenault	7c327c2fbb	AMDGPU: Fix broken opaque pointer handling in printf pass This was directly considering the pointee type, and also applying special semantics to constant address space.	2023-01-05 13:48:32 -05:00
Matt Arsenault	7b922fc0c3	AMDGPU: Fix broken and permissive handling of printf format strings This was completely broken with opaque pointers because it was specifically looking for a constant expression with the global variable as the first operand. Strip casts like normal, and properly validate all of the restrictions rather than silently ignoring any unhandled cases. Also be stricter that we aren't calling into some unresolved or non-constant format string. Also converts the test to opaque pointers and generated tests. There's more broken initializer handling for strings inside the format string processing too, but there's just no test coverage for this at all.	2023-01-05 09:18:00 -05:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Matt Arsenault	8dfe60c356	AMDGPU: Set scratch_en if there is dynamic stack but no fixed stack	2023-01-04 20:51:18 -05:00
Anshil Gandhi	4bbcbdaee5	[AMDGPU] Unify divergent nodes if the PostDom tree has one root This patch allows AMDGPUUnifyDivergenceExitNodes pass to transform a function whose PDT has exactly one root and ends in a branch instruction. Fixes https://github.com/llvm/llvm-project/issues/58861. Reviewed By: ruiling, arsenm Differential Revision: https://reviews.llvm.org/D139780	2023-01-04 10:45:03 -07:00
Jay Foad	6f7ff9b933	[MC] Consistently use MCInstrDesc::getImplicitUses and getImplicitDefs. NFC.	2023-01-04 13:16:12 +00:00
James Y Knight	7ff64d44b9	[AMDGPU] Fix useDeprecatedPositionallyEncodedOperands errors. This is a follow-on to https://reviews.llvm.org/D134073. The errors in the R600 half were fixed previously in https://reviews.llvm.org/D134078. Originally, I thought that the fixes to the AMDGPU half would be tricky, but upon taking another look, there were only a couple minor issues that needed fixing: 1. Previously, buffer load instructions (`BUFFER_LOAD__LDS_`) were populating the `vdata` field in the instruction from the `swz` operand. This was incorrect, but harmless, as when the LDS option is set, the instruction does not use the vdata field. 2. The `BUFFER_STORE_LDS_DWORD_gfx90a` instruction was populating `acc` from the `swz` operand, because `acc` was set to `?`. (I believe that the intent here was to leave the instruction bit as an "unknown value", but you can't do that except by setting the bits on `Inst` directly). Also harmless, for the same reason. Differential Revision: https://reviews.llvm.org/D140918	2023-01-03 17:52:10 -05:00
Ron Lieberman	750e1c8dbd	Revert "[libomptarget][plugin-nextgen] fix for [TypePromotion] NewPM support." This reverts commit `135f6a1ee8`.	2023-01-03 12:26:39 -06:00
Ron Lieberman	135f6a1ee8	[libomptarget][plugin-nextgen] fix for [TypePromotion] NewPM support.	2023-01-03 11:04:13 -06:00
Matt Arsenault	687e0e205e	AMDGPU: Create alloca wide load/store with explicit alignment This was introducing transient UB by using the default alignment of a larger vector type.	2023-01-03 11:29:18 -05:00
Matt Arsenault	49caf70121	AMDGPU: Use cast instead of unchecked dyn_cast	2023-01-03 10:32:10 -05:00
Matt Arsenault	6fed2c90d3	AMDGPU: Diagnose which LDS global failed to lower Also lowercase the message to start since that seems to be the prevailing convention for error messages.	2023-01-03 09:31:07 -05:00
Thomas Symalla	9aa0ee36fe	[NFC][AMDGPU] Make method declarations in SIInstrInfo equivalent to their definitions. Some functions from SIInstrInfo have their operands named different in their declarations vs. their defs. This was caught by cppcheck. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D140778	2022-12-30 19:18:41 +01:00
Ivan Kosarev	0a6dc9a816	[AMDGPU][AsmParser] Refine parsing cache policy modifiers. Reviewed By: dp, arsenm Differential Revision: https://reviews.llvm.org/D140108	2022-12-30 15:44:34 +00:00
Dmitry Preobrazhensky	e7a306310b	[AMDGPU][GFX11] Correct tied src2 of v_fmac_f16_e64 src2 was incorrectly defined as VSrc_f16 but it is tied to dst which is VGPR_32. As a result, disassembler failed to decode src2. Differential Revision: https://reviews.llvm.org/D140299	2022-12-30 16:42:15 +03:00
Dmitry Preobrazhensky	9f40d9ffd1	[AMDGPU][MC][GFX11] Correct encoding of neg modifier for v_dot2_f32_bf16 Fix a bug with neg_lo:[0,1,0] and neg_hi:[0,1,0] modifiers - they are accepted but not encoded. Differential Revision: https://reviews.llvm.org/D140470	2022-12-30 16:25:22 +03:00
Matt Arsenault	888228f2b0	AMDGPU: Use early continue to reduce indentation	2022-12-22 12:38:59 -05:00
Jay Foad	7e1e993816	[AMDGPU] Remove permlane discard vdst_in optimization from isel D72845 implemented the equivalent IR optimization in InstCombine so it seems that there's no advantage to doing it during isel too. This partially reverts D72844. Differential Revision: https://reviews.llvm.org/D140546	2022-12-22 15:49:26 +00:00
Jay Foad	821c7be8e6	[AMDGPU] Simplify simplifyAMDGCNMemoryIntrinsicDemanded. NFC.	2022-12-22 11:50:04 +00:00
Matt Arsenault	4463badf46	AMDGPU: Use DenormalMode type in FP mode tracking This simplies a future patch. The MIR handling should be fixed. We're still printing these in custom MachineFunctionInfo as bools (plus the inverted meaning is hard to follow).	2022-12-21 20:35:48 -05:00
Matt Arsenault	69e75ae695	CodeGen: Don't lazily construct MachineFunctionInfo This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on where this is first called, you can conclude different information based on the MachineFunction. For example, the AMDGPU implementation inspected the MachineFrameInfo on construction for the stack objects and if the frame has calls. This kind of worked in SelectionDAG which visited all allocas up front, but broke in GlobalISel which hasn't visited any of the IR when arguments are lowered. I've run into similar problems before with the MIR parser and trying to make use of other MachineFunction fields, so I think it's best to just categorically disallow dependency on the MachineFunction state in the constructor and to always construct this at the same time as the MachineFunction itself. A missing feature I still could use is a way to access an custom analysis pass on the IR here.	2022-12-21 10:49:32 -05:00
Mirko Brkusanin	a80edb7fc9	[AMDGPU][GlobalISel] Fix mapping G_FREEZE Differential Revision: https://reviews.llvm.org/D140416	2022-12-21 15:25:04 +01:00
Christudasan Devadasan	a3028239a7	Revert "[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs" This reverts commit `40ba0942e2`.	2022-12-21 16:17:42 +05:30
Piotr Sobczak	cce3cd203e	[AMDGPU][MC][NFC] MUBUF/MTBUF code cleanup Refactor code to reduce code duplication and improve maintainability. - Extract BUF_Pseudo common base class - Refactor getMUBUFInsDA - Refactor getMUBUFAtomicInsDA - Refactor getMTBUFInsDA - Refactor getMUBUFAsmOps - Refactor getMTBUFAsmOps Differential Revision: https://reviews.llvm.org/D140410	2022-12-21 10:09:05 +01:00
Nick Desaulniers	ad99774a5f	[llvm][PassSupport] don't require passes to be default constructible Quite a few passes are not default constructible. In order to properly support -{start\|stop}-{before\|after}= for these passes, we would like to continue to use INITIALIZE_PASS, but not necessarily provide a default constructor. Delete the default constructors of classes derived from SelectionDAGISel. Link: https://github.com/llvm/llvm-project/issues/59538 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D140349	2022-12-20 14:07:29 -08:00
Leon Clark	daa022ca57	Enable roundeven. Add support for roundeven and implement appropriate tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137954	2022-12-20 15:40:20 +00:00
Archibald Elliott	f09cf34d00	[Support] Move TargetParsers to new component This is a fairly large changeset, but it can be broken into a few pieces: - `llvm/Support/TargetParser` are all moved from the LLVM Support component into a new LLVM Component called "TargetParser". This potentially enables using tablegen to maintain this information, as is shown in https://reviews.llvm.org/D137517. This cannot currently be done, as llvm-tblgen relies on LLVM's Support component. - This also moves two files from Support which use and depend on information in the TargetParser: - `llvm/Support/Host.{h,cpp}` which contains functions for inspecting the current Host machine for info about it, primarily to support getting the host triple, but also for `-mcpu=native` support in e.g. Clang. This is fairly tightly intertwined with the information in `X86TargetParser.h`, so keeping them in the same component makes sense. - `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains the target triple parser and representation. This is very intertwined with the Arm target parser, because the arm architecture version appears in canonical triples on arm platforms. - I moved the relevant unittests to their own directory. And so, we end up with a single component that has all the information about the following, which to me seems like a unified component: - Triples that LLVM Knows about - Architecture names and CPUs that LLVM knows about - CPU detection logic for LLVM Given this, I have also moved `RISCVISAInfo.h` into this component, as it seems to me to be part of that same set of functionality. If you get link errors in your components after this patch, you likely need to add TargetParser into LLVM_LINK_COMPONENTS in CMake. Differential Revision: https://reviews.llvm.org/D137838	2022-12-20 11:05:50 +00:00
Carl Ritson	5bc703f755	[AMDGPU] Replace getPhysRegClass with getPhysRegBaseClass Accelerate finding the base class for a physical register by building a statically mapping table from physical registers to base classes using TableGen. Replace uses of SIRegisterInfo::getPhysRegClass with TargetRegisterInfo::getPhysRegBaseClass in order to use the computed table. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D139422	2022-12-20 16:22:14 +09:00
Sameer Sahasrabuddhe	475ce4c200	RFC: Uniformity Analysis for Irreducible Control Flow Uniformity analysis is a generalization of divergence analysis to include irreducible control flow: 1. The proposed spec presents a notion of "maximal convergence" that captures the existing convention of converging threads at the headers of natual loops. 2. Maximal convergence is then extended to irreducible cycles. The identity of irreducible cycles is determined by the choices made in a depth-first traversal of the control flow graph. Uniformity analysis uses criteria that depend only on closed paths and not cycles, to determine maximal convergence. This makes it a conservative analysis that is independent of the effect of DFS on CycleInfo. 3. The analysis is implemented as a template that can be instantiated for both LLVM IR and Machine IR. Validation: - passes existing tests for divergence analysis - passes new tests with irreducible control flow - passes equivalent tests in MIR and GMIR Based on concepts originally outlined by Nicolai Haehnle <nicolai.haehnle@amd.com> With contributions from Ruiling Song <ruiling.song@amd.com> and Jay Foad <jay.foad@amd.com>. Support for GMIR and lit tests for GMIR/MIR added by Yashwant Singh <yashwant.singh@amd.com>. Differential Revision: https://reviews.llvm.org/D130746	2022-12-20 07:22:24 +05:30
Ivan Kosarev	85dada81e3	[AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D138215	2022-12-19 11:39:08 +00:00
Sergei Barannikov	4d48ccfc88	[MC] Use `MCRegister` instead of `unsigned` in `MCTargetAsmParser` Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D140273	2022-12-18 12:12:05 -08:00

1 2 3 4 5 ...

7550 Commits