@jeanPerier explained the importance of converting box loads and stores into `memcpy`s instead of aggregate loads and stores, and I'll do my best to explain it here. * [(godbolt link) Example comparing opt transformations on memcpys vs aggregate load/stores](https://godbolt.org/z/be7xM83cG) * LLVM can more effectively reason about memcpys compared to aggregate load/stores. * This came up when others were discussing array descriptors for assumed-rank arrays passed to `bind(c)` subroutines, with the implication that the array descriptors are known to have lower bounds of 1 and that they are not pointer/allocatable types. * [(godbolt link) Clang also uses memcpys so we should probably follow them, assuming the clang developers are generatign what they know Opt will handle more effectively.](https://godbolt.org/z/YT4x7387W) * This currently may not help much without the `nocapture` attribute being propagated to function calls, but [it looks like someone may do this soon (discourse link)](https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401/23) or I can do this in a follow-up patch. Note on test `flang/test/Fir/embox-char.fir`: it looks like the original test was auto-generated. I wasn't too sure which parts were especially important to test, so I regenerated the test. If we want the updated version to look more like the old version, I'll make those changes.
61 lines
2.1 KiB
Fortran
61 lines
2.1 KiB
Fortran
! Test delayed privatization for allocatables: `firstprivate`.
|
|
|
|
! RUN: split-file %s %t
|
|
|
|
! RUN: %flang_fc1 -emit-hlfir -fopenmp -mmlir --openmp-enable-delayed-privatization \
|
|
! RUN: -o - %t/test_ir.f90 2>&1 | FileCheck %s
|
|
! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization -o - %t/test_ir.f90 2>&1 |\
|
|
! RUN: FileCheck %s
|
|
|
|
!--- test_ir.f90
|
|
subroutine delayed_privatization_allocatable
|
|
implicit none
|
|
integer, allocatable :: var1
|
|
|
|
!$omp parallel firstprivate(var1)
|
|
var1 = 10
|
|
!$omp end parallel
|
|
end subroutine
|
|
|
|
! CHECK-LABEL: omp.private {type = firstprivate}
|
|
! CHECK-SAME: @[[PRIVATIZER_SYM:.*]] : [[TYPE:!fir.ref<!fir.box<!fir.heap<i32>>>]] alloc {
|
|
|
|
! CHECK-NEXT: ^bb0(%[[PRIV_ARG:.*]]: [[TYPE]]):
|
|
|
|
! CHECK: } copy {
|
|
! CHECK: ^bb0(%[[PRIV_ORIG_ARG:.*]]: [[TYPE]], %[[PRIV_PRIV_ARG:.*]]: [[TYPE]]):
|
|
|
|
! CHECK-NEXT: %[[PRIV_BASE_VAL:.*]] = fir.load %[[PRIV_PRIV_ARG]]
|
|
! CHECK-NEXT: %[[PRIV_BASE_BOX:.*]] = fir.box_addr %[[PRIV_BASE_VAL]]
|
|
! CHECK-NEXT: %[[PRIV_BASE_ADDR:.*]] = fir.convert %[[PRIV_BASE_BOX]]
|
|
! CHECK-NEXT: %[[C0:.*]] = arith.constant 0 : i64
|
|
! CHECK-NEXT: %[[COPY_COND:.*]] = arith.cmpi ne, %[[PRIV_BASE_ADDR]], %[[C0]] : i64
|
|
|
|
! CHECK-NEXT: fir.if %[[COPY_COND]] {
|
|
! CHECK-NEXT: %[[ORIG_BASE_VAL:.*]] = fir.load %[[PRIV_ORIG_ARG]]
|
|
! CHECK-NEXT: %[[ORIG_BASE_ADDR:.*]] = fir.box_addr %[[ORIG_BASE_VAL]]
|
|
! CHECK-NEXT: %[[ORIG_BASE_LD:.*]] = fir.load %[[ORIG_BASE_ADDR]]
|
|
! CHECK-NEXT: hlfir.assign %[[ORIG_BASE_LD]] to %[[PRIV_PRIV_ARG]] realloc
|
|
! CHECK-NEXT: }
|
|
|
|
! RUN: %flang -c -emit-llvm -fopenmp -mmlir --openmp-enable-delayed-privatization \
|
|
! RUN: -o - %t/test_compilation_to_obj.f90 | \
|
|
! RUN: llvm-dis 2>&1 |\
|
|
! RUN: FileCheck %s -check-prefix=LLVM
|
|
|
|
!--- test_compilation_to_obj.f90
|
|
|
|
program compilation_to_obj
|
|
real, allocatable :: t(:)
|
|
|
|
!$omp parallel firstprivate(t)
|
|
t(1) = 3.14
|
|
!$omp end parallel
|
|
|
|
end program compilation_to_obj
|
|
|
|
! LLVM: @[[GLOB_VAR:[^[:space:]]+]]t = internal global
|
|
|
|
! LLVM: define internal void @_QQmain..omp_par
|
|
! LLVM: call void @llvm.memcpy.p0.p0.i32(ptr %{{.+}}, ptr @[[GLOB_VAR]]t, i32 48, i1 false)
|