Conversation
The key of this is the API, as shown with examples in test_sparse.f90. In particular, I am proposing the following functions / API: Higher level API: Conversion Dense <-> COO: * dense2coo * coo2dense Conversion COO -> CSR: * coo2csr CSR functionality: * csr_matvec * csr_getvalue Dense functionality: * getnnz Lower level API: * csr_has_canonical_format * csr_sum_duplicates * csr_sort_indices In these algorithms, it seems it is possible to just have one (best) implementation with one exception: sorting of indices (as implemented by `csr_sort_indices`), where different algorithms give different performance and it depends on the actual matrix. As such, it might make sense to provide a default in `csr_sort_indices` that is called by `coo2csr` that is as fast as we can make it for most cases (currently it uses quicksort, but there are other faster algoritms for our use case that we should switch over) and then provide other implementations such as `csr_sort_indices_mergesort`, `csr_sort_indices_timsort`, etc. for users so that they can choose the algorithm for sorting indices, or even provide their own. The file stdlib_experimental_sparse.f90 provides an example implementation of these algorithms that was inspired by SciPy. We can also discuss the details of that, but the key that I would like to discuss is the API itself.
@sfilippone I would really be interested in your feedback on the API, based on your work on psblas3. This is a serial API, which I think we should have, in addition to a parallel API that we'll tackle later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I have a somewhat related project https://.com/siesta-project/buds
which also has CSR matrices and some quite nice features I think.
I would be very happy to move this into stdlib in one way or the other.
Some notes:
I think such a complex data structure should be hidden in a type (class) to allow more stuff(discussions in link)- Some times it may be beneficial if the data contained isn't necessarily a single dimension (consider a 3D matrix with 2D sparsity and 1D dense).
- Many external interfaces (in C) require 0-based indexing, and hence it would be nice with a retrieval method of the
type
to a 0-based index (basically only doing this on the index arrays and returning with a pointer to the actual data array) - My implementation allows very efficient addition of elements by having an additional counter. This allows one to have empty elements in each row and thus only re-allocate when one tries to add more terms than this. If some forms of the csr matrix becomes an issue it may be beneficial to have a 2nd format that has a list (each row) of lists (with data). In this way one need only reallocate the row that needs to be bigger.
I probably have some more comments if you want?
real(dp), intent(in) :: Ax(:), x(:) | ||
real(dp) :: y(size(Ap)-1) | ||
integer :: i | ||
!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think omp
should be orphaned. In that way it is the user who controls the parallelization, and not the stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good comment. Let's discuss openmp in #213.
I forgot that I used openmp here, I am completely fine to remove it, that is an orthogonal issue to sparse matrices. If we decide to use openmp in stdlib, we can keep it here, if we decide not to, we can remove it. And I am happy to remove it to merge this PR, and we can always add it back in. Let's discuss more in #213.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. :)
end do | ||
end subroutine | ||
integer function getnnz(B) result(nnz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, perhaps these methods should allow integer(8)
to allow very large matrices.
Perhaps this should be get_nnz
?
!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) | ||
!$omp do | ||
do i = 1, size(Ap)-1 | ||
y(i) = dot_product(Ax(Ap(i):Ap(i+1)-1), x(Aj(Ap(i):Ap(i+1)-1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create a temporary which could potentially be very large. I would try to avoid this since the temporary still needs to traverse the copied elements.
end function | ||
real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csr_get_value
?
I have never really liked get
when intent is obvious, I may be wrong.
But csr_value
is just as clear to me. ;)
Great comments! Thank you. I'll reply later, I'll just point out a general design that we figured out that will satisfy almost everybody in the community: have a low level API that is procedural with no side effects (some people call it functional) as in this PR. And then have a high level API that is object oriented and hides complexity in a class or derived types and that uses the low level API to do the work. If you want, you can help suggest some good high level API for this. …On Sat, May 16, 2020, at 11:38 AM, Nick R. Papior wrote: ***@***.**** commented on this pull request. FYI, I have a somewhat related project https://.com/siesta-project/buds which also has CSR matrices and some quite nice features I think. I would be very happy to move this into stdlib in one way or the other. Some notes: * I think such a *complex* data structure should be hidden in a type (class) to allow more stuff * Some times it may be beneficial if the data contained isn't necessarily a single dimension (consider a 3D matrix with 2D sparsity and 1D dense). * Many external interfaces (in C) require 0-based indexing, and hence it would be nice with a retrieval method of the `type` to a 0-based index (basically only doing this on the index arrays and returning with a pointer to the actual data array) * My implementation allows very efficient addition of elements by having an additional counter. This allows one to have *empty* elements in each row and thus only re-allocate when one tries to add more terms than this. If some forms of the csr matrix becomes an issue it *may* be beneficial to have a 2nd format that has a list (each row) of lists (with data). In this way one need only reallocate the row that needs to be bigger. I probably have some more comments if you want? In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > + r1 = Ap(i) + r2 = Ap(i+1)-1 + l = r2-r1+1 + idx(:l) = iargsort_quicksort(Aj(r1:r2)) + Aj(r1:r2) = Aj(r1+idx(:l)-1) + Ax(r1:r2) = Ax(r1+idx(:l)-1) +end do +end subroutine + +function csr_matvec(Ap, Aj, Ax, x) result(y) +! Compute y = A*x for CSR matrix A and dense vectors x, y +integer, intent(in) :: Ap(:), Aj(:) +real(dp), intent(in) :: Ax(:), x(:) +real(dp) :: y(size(Ap)-1) +integer :: i +!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) Generally I think `omp` should be orphaned. In that way it is the user who controls the parallelization, and not the stdlib. In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +integer, intent(out) :: Ai(:), Aj(:) +real(dp), intent(out) :: Ax(:) +integer :: i, j, idx +idx = 1 +do j = 1, size(B, 2) + do i = 1, size(B, 1) + if (abs(B(i, j)) < tiny(1._dp)) cycle + Ai(idx) = i + Aj(idx) = j + Ax(idx) = B(i, j) + idx = idx + 1 + end do +end do +end subroutine + +integer function getnnz(B) result(nnz) Also, perhaps these methods should allow `integer(8)` to allow very large matrices. Perhaps this should be `get_nnz`? In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > + idx(:l) = iargsort_quicksort(Aj(r1:r2)) + Aj(r1:r2) = Aj(r1+idx(:l)-1) + Ax(r1:r2) = Ax(r1+idx(:l)-1) +end do +end subroutine + +function csr_matvec(Ap, Aj, Ax, x) result(y) +! Compute y = A*x for CSR matrix A and dense vectors x, y +integer, intent(in) :: Ap(:), Aj(:) +real(dp), intent(in) :: Ax(:), x(:) +real(dp) :: y(size(Ap)-1) +integer :: i +!$omp parallel default(none) shared(Ap, Aj, Ax, x, y) private(i) +!$omp do +do i = 1, size(Ap)-1 + y(i) = dot_product(Ax(Ap(i):Ap(i+1)-1), x(Aj(Ap(i):Ap(i+1)-1))) This will create a temporary which *could* potentially be very large. I would try to avoid this since the temporary still needs to traverse the copied elements. In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +l = 1 +i = size(A) +! Now we always have A(l) < val; A(i) >= val and we must make sure that "i" is +! the lowest possible such index. +do while (l + 1 < i) + idx = (l+i) / 2 + if (A(idx) < val) then + l = idx + else + i = idx + end if +end do +end function + + +real(dp) function csr_getvalue(Ap, Aj, Ax, i, j) result(r) `csr_get_value`? I have never really liked `get` when intent is obvious, I may be wrong. But `csr_value` is just as clear to me. ;) — You are receiving this because you authored the thread. Reply to this email directly, view it on <#189 (review)>, or unsubscribe <https://.com/notifications/unsubscribe-auth/AAAFAWDCCRCJHEQQTQ6PAFTRR3FSPANCNFSM4NC62JUA>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The different API look good to me.
My major concern is about the declaration of the pointer array of CSR. This array can easily contain values larger than integer(int32)
. Therefore, I wonder if it would not be advisable to declare it as integer(int64)
.
end subroutine | ||
integer function getnnz(B) result(nnz) | ||
real(dp), intent(in) :: B(:, :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for dense matrices.
Since we are in a module for sparse matrices, should it return nnz only for sparse matrices?
integer :: n | ||
B = 0 | ||
do n = 1, size(Ai) | ||
B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that all elements in Ai
and Aj
are filled, which is not always the case because COO is often used as a temporary format.
integer, intent(in) :: Ai(:), Aj(:) | ||
real(dp), intent(in) :: Ax(:) | ||
real(dp), intent(out) :: B(:, :) | ||
integer :: n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to always use integer(int64)
when going through a sparse matrix.
! Duplicate entries are carried over to the CSR representation. | ||
integer, intent(in) :: Ai(:), Aj(:) | ||
real(dp), intent(in) :: Ax(:) | ||
integer, intent(out) :: Bp(:), Bj(:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to declare Bp(:)
as integer(int64)
to avoid any troubles with large sparse matrices.
However, we may want to be compatible to e.g., psblas3 (I didn't check what is the default declaration in psblas3).
verbose_ = .false. | ||
if (present(verbose)) verbose_ = verbose | ||
allocate(Bp(maxval(Ai)+1)) | ||
if (verbose_) print *, "coo2csr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optval
can be used here ;)
call csr_sum_duplicates(Bp, Bj_, Bx_) | ||
if (verbose_) print *, "done" | ||
nnz = Bp(size(Bp))-1 | ||
allocate(Bj(nnz), Bx(nnz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be removed with Fortran2003 and>, right?
subroutine csr_sum_duplicates(Ap, Aj, Ax) | ||
! Sum together duplicate column entries in each row of CSR matrix A | ||
! The column indicies within each row must be in sorted order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! The column indicies within each row must be in sorted order. | |
! The column indices within each row must be in sorted order. |
end do | ||
end subroutine | ||
function csr_matvec(Ap, Aj, Ax, x) result(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to follow the API of Sparse BLAS.
r = 0 | ||
end function | ||
pure elemental subroutine swap_int(x,y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this subroutine should be moved to another module of stdlib
.
call check(all(abs(Bx - [5, 7, 1, 3]) < 1e-12_dp)) | ||
call check(csr_has_canonical_format(Bp, Bj)) | ||
call check(abs(csr_getvalue(Bp, Bj, Bx, 1, 1) - 5) < tiny(1._dp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By experience, tiny(1._dp)
can be too severe.
Re: int32 vs int64. You'll need both. Let me elaborate. The most common application of sparse matrices is in the inner steps of some PDE solver, dealing with a discretization on a mesh. Such problems can indeed reach sizes that need indices with int64; however in those cases they are tackled through parallel solution strategies mostly based on MPI. The parallelization strategy is to assign a subdomain to each MPI process; each subdomain is most likely of a size that fits comfortably in an int32, provided that you apply a local numbering scheme, as well as a global numbering scheme. And you occasionally need to convert the indices of the local portion into global numbering for some processing, and back; examples include data exchanges among processes within a transpose operation. The solution that I have now in the development branch of PSBLAS is the following: there are 4 integer kinds: int32==psb_mpk_ <= psb_ipk_ <= psb_lpk_ <= psb_epk_== int64 IPK is the integer kind used for *LOCAL indices, which are the ones for the local portion of the matrices involved in numerical operators such as Matrix-Vector product, LPK is the integer kind used for GLOBAL indices, which are used in pre- and post-processing. IPK and LPK can be specified at configure time within the constraints abov, with the default being IPK=int32 and LPK=int64. Salvatore …On Sat, May 16, 2020 at 11:32 PM Jeremie Vandenplas < ***@***.***> wrote: ***@***.**** commented on this pull request. The different API look good to me. My major concern is about the declaration of the pointer array of CSR. This array can easily contain values larger than integer(int32). Therefore, I wonder if it would not be advisable to declare it as integer(int64). ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +real(dp), intent(out) :: Ax(:) +integer :: i, j, idx +idx = 1 +do j = 1, size(B, 2) + do i = 1, size(B, 1) + if (abs(B(i, j)) < tiny(1._dp)) cycle + Ai(idx) = i + Aj(idx) = j + Ax(idx) = B(i, j) + idx = idx + 1 + end do +end do +end subroutine + +integer function getnnz(B) result(nnz) +real(dp), intent(in) :: B(:, :) This is for dense matrices. Since we are in a module for sparse matrices, should it return nnz only for sparse matrices? ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > + if (abs(B(i, j)) < tiny(1._dp)) cycle + nnz = nnz + 1 + end do +end do +end function + +! COO + +subroutine coo2dense(Ai, Aj, Ax, B) +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +real(dp), intent(out) :: B(:, :) +integer :: n +B = 0 +do n = 1, size(Ai) + B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n) This implies that all elements in Ai and Aj are filled, which is not always the case because COO is often used as a temporary format. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +nnz = 0 +do j = 1, size(B, 2) + do i = 1, size(B, 1) + if (abs(B(i, j)) < tiny(1._dp)) cycle + nnz = nnz + 1 + end do +end do +end function + +! COO + +subroutine coo2dense(Ai, Aj, Ax, B) +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +real(dp), intent(out) :: B(:, :) +integer :: n I would suggest to always use integer(int64) when going through a sparse matrix. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +real(dp), intent(in) :: Ax(:) +real(dp), intent(out) :: B(:, :) +integer :: n +B = 0 +do n = 1, size(Ai) + B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n) +end do +end subroutine + +subroutine coo2csr(Ai, Aj, Ax, Bp, Bj, Bx) +! Converts from COO (Ai, Aj, Ax) into CSR (Bp, Bj, Bx) +! Row and column indices are *not* assumed to be ordered. +! Duplicate entries are carried over to the CSR representation. +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +integer, intent(out) :: Bp(:), Bj(:) I suggest to declare Bp(:) as integer(int64) to avoid any troubles with large sparse matrices. However, we may want to be compatible to e.g., psblas3 (I didn't check what is the default declaration in psblas3). ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +! Converts from COO (Ai, Aj, Ax) into CSR (Bp, Bj, Bx) +! Row and column indices are *not* assumed to be ordered. +! Duplicate entries are summed up and the indices are ordered. +integer, intent(in) :: Ai(:), Aj(:) +real(dp), intent(in) :: Ax(:) +integer, allocatable, intent(out) :: Bp(:), Bj(:) +real(dp), allocatable, intent(out) :: Bx(:) +logical, optional, intent(in) :: verbose +integer :: Bj_(size(Ai)) +real(dp) :: Bx_(size(Ai)) +integer :: nnz +logical :: verbose_ +verbose_ = .false. +if (present(verbose)) verbose_ = verbose +allocate(Bp(maxval(Ai)+1)) +if (verbose_) print *, "coo2csr" optval can be used here ;) ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +integer :: Bj_(size(Ai)) +real(dp) :: Bx_(size(Ai)) +integer :: nnz +logical :: verbose_ +verbose_ = .false. +if (present(verbose)) verbose_ = verbose +allocate(Bp(maxval(Ai)+1)) +if (verbose_) print *, "coo2csr" +call coo2csr(Ai, Aj, Ax, Bp, Bj_, Bx_) +if (verbose_) print *, "csr_sort_indices" +call csr_sort_indices(Bp, Bj_, Bx_) +if (verbose_) print *, "csr_sum_duplicates" +call csr_sum_duplicates(Bp, Bj_, Bx_) +if (verbose_) print *, "done" +nnz = Bp(size(Bp))-1 +allocate(Bj(nnz), Bx(nnz)) This line can be removed with Fortran2003 and>, right? ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +! conditions facilitate faster matrix computations. +integer, intent(in) :: Ap(:), Aj(:) +integer :: i, j +r = .false. +do i = 1, size(Ap)-1 + if (Ap(i) > Ap(i+1)) return + do j = Ap(i)+1, Ap(i+1)-1 + if (Aj(j-1) >= Aj(j)) return + end do +end do +r = .true. +end function + +subroutine csr_sum_duplicates(Ap, Aj, Ax) +! Sum together duplicate column entries in each row of CSR matrix A +! The column indicies within each row must be in sorted order. ⬇️ Suggested change -! The column indicies within each row must be in sorted order. +! The column indices within each row must be in sorted order. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +subroutine csr_sort_indices(Ap, Aj, Ax) +! Sort CSR column indices inplace +integer, intent(inout) :: Ap(:), Aj(:) +real(dp), intent(inout) :: Ax(:) +integer :: i, r1, r2, l, idx(size(Aj)) +do i = 1, size(Ap)-1 + r1 = Ap(i) + r2 = Ap(i+1)-1 + l = r2-r1+1 + idx(:l) = iargsort_quicksort(Aj(r1:r2)) + Aj(r1:r2) = Aj(r1+idx(:l)-1) + Ax(r1:r2) = Ax(r1+idx(:l)-1) +end do +end subroutine + +function csr_matvec(Ap, Aj, Ax, x) result(y) I suggest to follow the API of Sparse BLAS. ------------------------------ In src/stdlib_experimental_sparse.f90 <#189 (comment)>: > +real(dp), intent(in) :: Ax(:) +integer, intent(in) :: i, j +integer :: row_start, row_end, offset +row_start = Ap(i) +row_end = Ap(i+1)-1 +offset = lower_bound(Aj(row_start:row_end), j) + row_start - 1 +if (offset <= row_end) then + if (Aj(offset) == j) then + r = Ax(offset) + return + end if +end if +r = 0 +end function + +pure elemental subroutine swap_int(x,y) I think this subroutine should be moved to another module of stdlib. ------------------------------ In src/tests/sparse/test_sparse.f90 <#189 (comment)>: > +call check(all(Bj == [1, 1, 3, 3, 4, 2])) +call check(all(abs(Bx - [1, 4, 2, 5, 1, 3]) < 1e-12_dp)) +call csr_sum_duplicates(Bp, Bj, Bx) +nnz = Bp(size(Bp))-1 +call check(all(Bp == [1, 2, 4, 5])) +call check(all(Bj(:nnz) == [1, 3, 4, 2])) +call check(all(abs(Bx(:nnz) - [5, 7, 1, 3]) < 1e-12_dp)) +call check(csr_has_canonical_format(Bp, Bj(:nnz))) + +call coo2csr_canonical(Ai, Aj, Ax, Bp, Bj, Bx) +call check(all(Bp == [1, 2, 4, 5])) +call check(all(Bj == [1, 3, 4, 2])) +call check(all(abs(Bx - [5, 7, 1, 3]) < 1e-12_dp)) +call check(csr_has_canonical_format(Bp, Bj)) + +call check(abs(csr_getvalue(Bp, Bj, Bx, 1, 1) - 5) < tiny(1._dp)) By experience, tiny(1._dp) can be too severe. — You are receiving this because you were mentioned. Reply to this email directly, view it on <#189 (review)>, or unsubscribe <https://.com/notifications/unsubscribe-auth/AD274T4OGJKXS6GJTVKGM3TRR4A7RANCNFSM4NC62JUA> . |
Thank you Salvatore for your input. I should have a closer look to psblas. For |
I am OK with integrating PSBLAS in stdlib (with proper handling of copyright for me and my collaborators), at least the parts that make sense today (I suspect I have a lot of stuff that should not be included just yet). In the project there are a set of facilities marked at SERIAL (admittedly, they should be packaged better because they are split in two-three subdirs) which are the ones you are probably most interested in. Note that I also use a home-grown set of sed scripts to have a crude emulation of macros/templates to avoid rewriting things multiple times; if that could be morphed into something robust I'd be more than happy, while waiting for the Fortran standard committee to decide on a template facility (I have seen more than one proposal in this direction, but it's not yet decided AFAIK). A few additional comments: 1. Storage formats. I have a set of additional storage formats in psblas3-ext; they also support CUDA, accessed through the C interoperability. 2. Coarrays. I have a preliminary version with CoArrays (in fact, I am one of the founders of the OpenCoArrays project which is used to provide CoArrays in GNU). However, there is a big issue and that is constraints C825. The effect of the constraint is that the set of entities that can have a coarray component is fixed at compile time; in other words, you cannot have an allocatable array of derived types that contains a(n allocatable) coarray component. I have reported this to the Fortran standard committee (see https://j3-fortran.org/doc/year/18/18-280r1.txt ) and indeed my proposal has been approved; however there is no telling when it will appear in compilers, hence I am not going to use this in production code any time soon. Salvatore …On Sun, May 17, 2020 at 9:19 AM Jeremie Vandenplas ***@***.***> wrote: Re: int32 vs int64. You'll need both. Let me elaborate. The most common application of sparse matrices is in the inner steps of some PDE solver, dealing with a discretization on a mesh. Such problems can indeed reach sizes that need indices with int64; however in those cases they are tackled through parallel solution strategies mostly based on MPI. The parallelization strategy is to assign a subdomain to each MPI process; each subdomain is most likely of a size that fits comfortably in an int32, provided that you apply a local numbering scheme, as well as a global numbering scheme. And you occasionally need to convert the indices of the local portion into global numbering for some processing, and back; examples include data exchanges among processes within a transpose operation. The solution that I have now in the development branch of PSBLAS is the following: there are 4 integer kinds: int32==psb_mpk_ <= psb_ipk_ <= psb_lpk_ <= psb_epk_== int64 IPK is the integer kind used for *LOCAL indices, which are the ones for the local portion of the matrices involved in numerical operators such as Matrix-Vector product, LPK is the integer kind used for GLOBAL indices, which are used in pre- and post-processing. IPK and LPK can be specified at configure time within the constraints abov, with the default being IPK=int32 and LPK=int64. Salvatore Thank you Salvatore for your input. I should have a closer look to psblas. For stdlib: would it not be of interest to integrate psblas3 (if allowed) in stdlib directly? Or through fpm? In comparison to the other implementations, psblas3 is well tested, is associated to peer-reviewed publications and is parallelized (also with Coarray Fortran?) if we want to support parallelization in stdlib. If something is missing, we could identify it and maybe collaborate with psblas to implement the missing parts there? This is just a question to be sure that we don't start into something that is similar to reinventing the wheel. — You are receiving this because you were mentioned. Reply to this email directly, view it on <#189 (comment)>, or unsubscribe <https://.com/notifications/unsubscribe-auth/AD274T4E5LRQPHUNASHTN53RR6FW3ANCNFSM4NC62JUA> . |
Finally got to my computer. Thanks everybody for the feedback, I'll first address the high level feedback: do we want this at all, or should stdlib just use psblas3? My own view is that stdlib should not be competing with or reimplementing well established libraries, whether LAPACK, psblas3, or others. Just linking against psblas3 but not providing any additional functionality or API I think is also not worth it. As the Fortran Package Manager becomes more successful, it will be very easy to depend on any library such as psblas3 in users projects, and we should be encouraging people to do exactly that. It will be good for users, good for developers of such libraries and the whole ecosystem. Where I see the role of stdlib is to standardize API and provide a reference implementation to functionalities that are repeated from project to project with slightly different API each. There are many implementations of CSR and other functionality, see #38 for a partial list, and there are many codes that use this in some form. We should see if there is a way to figure out an API that each of these codes could use. The other aspect where I see stdlib as contributing is when people just want to quickly use some functionality in their research / new projects, or later on interactively in the Jupyter notebook for example, having a nice and well documented API that is very easy to use for new users would be beneficial. Later on, for big specialized production codes, one can always use specialized libraries. As an example, most electronic structure codes have their own custom specialized eigensolver. But I think it is still worth having a nice API to dense eigensolver via Lapack, and a sparse eigensolver on top of this sparse functionality, so that people who just want to some algorithm in Fortran can do so as easily as with SciPy, Matlab or Julia. In order to move the conversation forward, we have to have both non-OO low level api as well as a high level OO API. The reason is that I know many people (myself included) who do not want to use the OO API, just simple subroutines that do the job. But I also know a lot of people who would prefer the OO API. By providing both, we can capture most of the Fortran community and make stdlib useful for almost everybody. @sfilippone at the very least I can see that we can use your sorting algorithms. I am hoping there is more that we can figure out how to reuse. Also, I just want to start with a serial implementation, we can overload the subroutines to work with both int32 and int64 etc. I am really hoping we can figure out an API that we would all agree would make sense to have in stdlib. |
@certik I totally agree with your comments.However, I wonder if this could be achieved by interfacing with the serial version of psblas, even partially. If it is not possible (e.g., because of no non-OO low level API), then I am fine with this, and I will help as good as I can. |
I don't think that stdlib should just gulp up existing libraries and expose them under different name. First, decide if sparse matrix algebra is in scope or not. I'm not the target user (although I use SMM through ESMF under the hood daily), but if we agreed that a SciPy-like functionality is in scope, then this is too. Second, if you do add it to stdlib, I think that it should provide added value over any existing library. In this case, I see that the added value could be ease of installation, and ease of use (simpler API). This resonated with me personally the most:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can go forward, pending any unresolved requests for changes from other reviewers.
What is the current status of this ? We have a couple of merge conflicts in the build files, which need a merge/rebase against the default branch. Also, there are a couple of unaddressed review comments. @certik, is there anything we can do to help you move this pull request forward? |
@certik would you like to continue working on this or should we close it for the time being? |
@milancurcic are you against this approach to sparse arrays? If not, then let's keep it open. While I don't have time to finish this myself right now, others should help. If we close it, then it will be harder for people to know about it. |
We can keep it open. What I'm asking is whether you intend to continue working on this eventually, when you have more time. I ask this because it wasn't clear to me whether you thought this PR can go forward with the changes requested. In essence, is this PR stale due to lack of time or lack of consensus? What should be tackled next if someone else were to pick it up? Would you be able to review the changes if we found a person to take over this PR? |
I reread all comments. I don't see any major disagreements. What has to be done:
I don't have the time right now, but I want to finish this, or help others finish it. This will be a nice selling point of stdlib to have these basic sparse routines in, nicely documented and very performing. |
I wrote
So I disagree with the statement:
Furthermore, I think marking things as "wontfix" is not encouraging to others (or me!) to pick it up and finish it. For this reason, and given that my stated intention is to finish this, I have removed the "wontfix" label. We need to help each other finish these things and get them into stdlib to move stdlib forward. You don't want such COO functionality in there? If you do, then let's plan how to do it. |
Be assured that I want to move project forward. Frankly, I don't think this can be moved forward in its current form, even after resolving the merge conflicts. I have been looking into this a couple of times now, and stopped again because it has significantly diverged from the latest default branch, which makes it quite involved to work with the implementation because a lot of changes in the general structure and build files are just missing. Instead I would suggest to salvage the implementation and start fresh from the latest default branch, addressing the feedback provided already. We have done so in the past with other es like the stringlist and were quite successful with this approach. |
@awvwgk awesome. I know that's not what you meant and that you want to move the project forward. But that is how I perceive it and I think many others too. To move forward, I think the label that you just used "help wanted" is perfect. Let's keep this PR open until a better one is up. |
In the following
instead of using please declare a parameter: real(dp), parameter :: zero=tiny(1.0_dp)
...
if (abs(B(i, j)) < zero ) cycle
... |
In the following subroutine
you can exploit do concurrent(n = 1:size(Ai))
B(Ai(n), Aj(n)) = B(Ai(n), Aj(n)) + Ax(n)
end do |
Following #760, I guess this PR could be closed? (Yep, I'm digging very far into the list of open PR to just because I think it'd be good to close those which are no longer relevant or stalled for too long just to increase the clarity of what is currently being worked on in |
The key of this is the API, as shown with examples in test_sparse.f90.
In particular, I am proposing the following functions / API:
Conversion Dense <-> COO:
Conversion COO -> CSR:
CSR functionality:
Dense functionality:
In these algorithms, it seems it is possible to just have one (best)
implementation with one exception: sorting of indices (as implemented by
csr_sort_indices
), where different algorithms give differentperformance and it depends on the actual matrix. As such, it might make
sense to provide a default in
csr_sort_indices
that is called bycoo2csr_canonical
that is as fast as we can make it for most cases (currently ituses quicksort, but there are other faster algoritms for our use case
that we should switch over) and then provide other implementations such
as
csr_sort_indices_mergesort
,csr_sort_indices_timsort
, etc. forusers so that they can choose the algorithm for sorting indices, or even
provide their own.
The file stdlib_experimental_sparse.f90 provides an example
implementation of these algorithms that was inspired by SciPy. We can
also discuss the details of that, but the key that I would like to
discuss is the API itself.
See #38.