Project roadmap and task tracking for the nfx-cpu library.
Todo
- Extended SIMD detection:
- SSE3, SSSE3, SSE4.1 detection
- Add hasSSE3Support(), hasSSSE3Support(), hasSSE41Support()
- Add unit tests in TESTS_FeatureDetection.cpp
- Add benchmarks in BM_FeatureDetection.cpp
- FMA (Fused Multiply-Add) detection
- Add verifyFmaSupport() compile-time helper
- Add unit tests and benchmarks
- AVX-512 detection (F, CD, ER, PF, BW, DW, VL, IFMA, VBMI variants)
- Add granular AVX-512 variant detection functions
- Add unit tests for each variant
- Add performance benchmarks
- ARM NEON detection (cross-platform)
- Requires ARM CI/CD infrastructure
- Platform-specific CPUID equivalent
- Full test coverage on ARM hardware
- Cache topology:
- CacheInfo struct with L1/L2/L3 sizes, line size
- cacheInfo() - Complete cache hierarchy
- Add TESTS_CacheTopology.cpp test suite
- Add BM_CacheTopology.cpp benchmark suite
- Sample demonstrating cache-aware optimization
- TSC and frequency:
- hasTSC(), hasInvariantTSC()
- readTSC() - Timestamp counter access
- baseFrequency(), maxFrequency() (MHz)
- hasTurboBoost() detection
- Add unit tests for frequency detection
- Add TSC benchmark for overhead measurement
- Additional x86 instructions:
- hasBMI() - Bit manipulation instructions (BMI1/BMI2)
- hasPOPCNT() - Population count
- hasFMA() - Already in Phase 1 SIMD
- Add unit tests for each instruction set
- Add benchmarks comparing with/without hardware support
- ARM extensions:
- hasSVE() - Scalable Vector Extension
- hasCRC32() - ARM CRC32 (different from x86)
- Security features:
- hasAESNI() - Hardware AES encryption
- hasSHA() - SHA-1/SHA-256 acceleration
- hasRDRAND(), hasRDSEED() - Hardware RNG
- hasSGX() - Software Guard Extensions
- hasSMEP() - Supervisor Mode Execution Prevention
- Add security feature tests
- Add benchmarks for crypto hardware acceleration
- Sample demonstrating AES-NI usage detection
- NUMA topology:
- numaNodeCount() - NUMA nodes
- NUMA-aware memory allocation helpers
- Add NUMA-aware allocation benchmarks
- Platform-specific implementation (Windows vs Linux)
- Power management:
- hasSpeedStep(), hasThermalMonitor()
- thermalThreshold() - Thermal limits
- Low-level intrinsics:
- prefetch(addr) - Cache prefetch hint
- pause() - Spinlock optimization
- serialize() - Memory fence
- Add unit tests for intrinsic wrappers
- Add benchmarks measuring prefetch impact
- Cross-platform intrinsic abstraction
- Add CpuInfo struct:
- const CpuInfo& cpuInfo() - Singleton with full CPU capabilities
- Structured organization: identification, topology, cache, features, frequency
- Single detection pass, cache everything at once
- Keep free function API (hasAvx2Support(), etc.) for backward compatibility
- Hybrid approach: free functions OR struct depending on use case
In Progress
Done ✓
v0.1.3 (2025-11-27)
- CMake Packaging - Bug Fixes and Cleanup
- Remove incorrect runtime dependencies from DEB/RPM packages (header-only library)
- Remove duplicate WiX tool detection
- Simplify RPM architecture detection to use uname directly
v0.1.2 (2025-11-25)
- CPU Feature Detection - Performance Optimization
- Skip runtime CPUID checks when features are compile-time guaranteed
- Return immediately when __AVX2__, __AVX__, or __SSE4_2__ macros are defined
- Eliminate unnecessary static lambda initialization in optimized builds
- Zero overhead detection when compiled with feature flags (-mavx2, etc.)
- Maintain backward compatibility with runtime detection behavior
v0.1.1 (2025-11-12)
- CPU Feature Detection - OS Support Verification
- Add OSXSAVE flag check for AVX/AVX2 detection
- Verify XCR0 register state (XMM/YMM bits) via xgetbv instruction
- Use inline assembly for GCC/Clang to avoid -mxsave flag dependency
- Fix MSVC AVX2 detection to use __cpuidex() properly
- Refactor magic numbers into named constants in internal namespace