Matrix Multiplication and Vector Processing

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

“Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield ...

Network World

What are TPUs? Your guide to tensor processing units and AI acceleration

This guide shows how TPUs crush performance bottlenecks, reduce training time, and offer immense scalability via Google Cloud ...

Nature

Sparse Matrix Computations on Graphics Processing Units

Sparse matrix computations are pivotal to advancing high-performance scientific applications, particularly as modern numerical simulations and data analyses demand efficient management of large, ...

Design-Reuse

A custom RISC-V vector instruction to accelerate structured-sparse matrix multiplications

A novel AI-acceleration paper presents a method to optimize sparse matrix multiplication for machine learning models, particularly focusing on structured sparsity. Structured sparsity involves a ...

insideHPC

Intel MKL Speeds Up Small Matrix-Matrix Multiplication for Automatic Driving

Nearly all big science, machine learning, neural network, and machine vision applications employ algorithms that involve large matrix-matrix multiplication. But multiplying large matrices pushes the ...

Ars Technica

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

Researchers claim to have developed a new way to run AI language models more efficiently by eliminating matrix multiplication from the process. This fundamentally redesigns neural network operations ...

Ars Technica

DeepMind breaks 50-year math record using AI; new record falls a week later

Matrix multiplication is at the heart of many machine learning breakthroughs, and it just got faster—twice. Last week, DeepMind announced it discovered a more efficient way to perform matrix ...

EDN

RISC-V Tensor Unit claims to turbocharge AI applications

A new RISC-V Tensor Unit, based on fully customizable 64-bit cores, claims to provide a huge performance boost for artificial intelligence (AI) applications compared to just running software on scalar ...

GSM Arena

ARM announces its next-gen ARMv9 architecture: focus on security, AI and vector processing

The ARMv8 architecture was announced in 2011, a full decade ago. It was a massive change as it moved from 32-bit to 64-bit. Over the last 5 years there have been more than 100 billion ARMv8 devices.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results