rcpp - C++ AVX2 Function Pointersstd::function not working on Windows (but work on Linux) - Stack Overflow

admin2025-05-01  1

I'm experiencing an issue where passing AVX2 functions through function pointers or std::function works fine on Linux, but crashes on Windows. Direct AVX2 operations work fine on both platforms.

Specifically, I have these (dummy - my real fns are much more complicated) AVX2 functions, which work fine when called directly:

// Simple test function that just multiplies vector by 2
__m256d  test_simple_AVX2(const __m256d x) {
   
   const __m256d two = _mm256_set1_pd(2.0);
   const __m256d res = _mm256_mul_pd(x, two);
   return res; 
   
 }
 
 // The scalar version for comparison
 double test_simple_double(const double x) {
   
   const double res = 2.0*x;
   return res;
   
 }  

Now, if I call test_simple_AVX2 directly, it works fine - however, I need to be able to make the code more general and pass the function onto another function and then call test_simple_AVX2 from within that function.

Specifically, like this:

template <typename T>
inline  void TEST_fn_AVX2_row_or_col_vector(    Eigen::Ref<T>  x_Ref,
                                                FuncAVX fn_AVX,
                                                FuncDouble  fn_double) {
  
            
            const int N = x_Ref.size();
            const int vect_size = 4;
            const double vect_siz_dbl = static_cast<double>(vect_size);
            const double N_dbl = static_cast<double>(N);
            const int N_divisible_by_vect_size = std::floor(N_dbl / vect_siz_dbl) * vect_size;
            
            Eigen::Matrix<double, -1, 1> x_tail = Eigen::Matrix<double, -1, 1>::Zero(vect_size); // last vect_size elements
            {
              int counter = 0;
              for (int i = N - vect_size; i < N; ++i) {
                x_tail(counter) = x_Ref(i);
                counter += 1;
              } 
            }
            
            if (N >= vect_size) {
                
                alignas(32) double buffer[4];  // using an aligned buffer for AVX operations
                
                for (int i = 0; i + vect_size <= N_divisible_by_vect_size; i += vect_size) {
                  
                  // Copy data to aligned buffer
                  for(int j = 0; j < vect_size; j++) {
                    buffer[j] = x_Ref(i + j);
                  }
                  
                  const __m256d AVX_array = _mm256_load_pd(buffer);
                  const __m256d AVX_array_out = fn_AVX(AVX_array); //// ERROR / ABORTED SESSION OCCURS HERE (so when calling "fn_AVX"). 
                  
                  //// HOWEVER, if do manually (w/o calling seperate function), then it works! i.e.:
                  // const __m256d two = _mm256_set1_pd(2.0);
                  // const __m256d AVX_array_out = _mm256_mul_pd(AVX_array, two);
                  
                  _mm256_store_pd(buffer, AVX_array_out);
                  
                  // Copy back to Eigen
                  for(int j = 0; j < vect_size; j++) {
                    x_Ref(i + j) = buffer[j];
                  }
                  
                }
                
                if (N_divisible_by_vect_size != N) {    // Handle remainder
                  int counter = 0;
                  for (int i = N - vect_size; i < N; ++i) {
                    x_Ref(i) =  fn_double(x_tail(counter));
                    counter += 1;
                  }
                }
                
            }  else {   // If N < vect_size, handle everything with scalar operations
              
              for (int i = 0; i < N; ++i) {
                x_Ref(i) = fn_double(x_Ref(i));
              }
              
            }
  
}


Now, I have tried to define "FuncAVX" using either function pointers, or using std::function - and neither of them work on Windows, however on Linux they work just fine!

Here's my attempt using function pointers:

typedef __m256d (*FuncAVX)(const __m256d); /// not working (on Windows - fine on Linux)!!

Using std::function also doesn't work:

typedef std::function<__m256d(const __m256d)> FuncAVX; /// not working (on Windows - fine on Linux)!!

So just to make it clear, when I do:

const __m256d AVX_array = _mm256_load_pd(buffer);
const __m256d two = _mm256_set1_pd(2.0);
const __m256d AVX_array_out = _mm256_mul_pd(AVX_array, two);
_mm256_store_pd(buffer, AVX_array_out);

It works fine (even on Windows), however, if I try to call the function ("fn_AVX") then it does not work on Windows - but does work on Linux. I.e. if I try this:

const __m256d AVX_array = _mm256_load_pd(buffer);
const __m256d AVX_array_out = fn_AVX(AVX_array); //// ERROR / ABORTED SESSION OCCURS HERE (so when calling "fn_AVX"). 
_mm256_store_pd(buffer, AVX_array_out);

It doesn't work on Windows.

Does anybody have any idea why this doesn't work on Windows?

Also, I have tried using UNALIGHED AVX intrinsics (i.e. using _mm256_loadu_pd and _mm256_storeu_pd instead of _mm256_load_pd and _mm256_store_pd) - and I still get the same issue!

More info: I'm using C++ via Rcpp. compiler: g++ compiler flags: -O3 -march=znver3 -mtune=znver3 -fPIC -D_REENTRANT -DSTAN_THREADS -pthread -fpermissive -mfma -mavx -mavx2 -flarge-source-files

转载请注明原文地址:http://www.anycun.com/QandA/1746101272a91684.html