Powerful AI that doesn't use heaps of energy one step closer thanks to Microsoft

Microsoft releases open source BitNet, which simplifies AI large language models so that they run faster and with less energy, even on people's local computers

Technology / news

Microsoft releases open source BitNet, which simplifies AI large language models so that they run faster and with less energy, even on people's local computers

21st Oct 24, 10:09am by Juha Saarinen

Dall-E tries to illustrate a 1-bit LLM running on a smartphone.

The Dall-E AI illustration of a 1-bit LLM running on a smartphone.

Apologies for the frequent artificial intelligence (AI) stories but the technology is still crashing through everything, showing no signs of stopping. This despite its obvious flaws like being inherently unreliable, hard to understand as well as requiring planet-destroying computing resources and electricity.

Let's go a bit Deep Geek and see where we are with the above.

Leaving the first two issues aside, Microsoft has released and open sourced BitNet which might just help with the latter two. BitNet is "the official inference framework for 1-bit large language models" which as the Institute of Electrical and Electronic Engineers (IEEE) says could help drop AI's immense energy use and be much faster to use.

So much so that you can run very large LLMs on your own computers instead of a giant server farm in the cloud, hosted in a data centre somewhere.

I'm able to run Meta's Llama 3.1 with 70 billion parameters on my MacBook Pro M3 Max with 96 gigabytes of memory, but it's not exactly blazingly fast on the laptop. Furthermore, Llama 3.1:70b chews through MacBook's battery in no time as you interact with the model.

As I run Llama 3.1:70b, I can see the graphics card cores on the M3 being loaded up to the hilt and the amount of free memory decreases rapidly. The MacBook M3 Max becomes toasty with the fans starting up, indicating high energy usage.

Enter Bitnet which promises according to the readme:

Speedups of 1.37 to 5.07 times on Arm central processors (CPUs), with larger models seeing greater performance gains.
Reduces energy usage by 55.4 to 70 per cent.
On Intel/AMD x86 CPUs, the speedup gains and energy reductions are even greater.
Lets you run a 100 billion parameter b1.58 model on a single CPU with speeds comparable to human reading which is five to seven tokes per second.

Model inference on Apple M2 Ultra. Source: Microsoft

"Parameters" is a term you hear mentioned with AI and machine learning (ML) once you start poking under the hood of the technology. AI vendor Perplexity.ai defines parameters as:

"AI parameters are the adjustable elements within a model that are learned from training data, including weights, biases, and scaling factors.

These parameters are crucial for the model's ability to learn from data and make accurate predictions or decisions. During training, optimisation algorithms adjust these parameters to minimise the error between the model's predictions and the actual values, thereby enhancing performance.

The complexity and number of parameters can significantly influence a model's ability to capture intricate patterns in data, with too many parameters risking overfitting and too few leading to underfitting."

See what I mean about "hard to understand"? Long story short, small models might have something like three to eight billion parameters; there are medium-sized ones like Llama 3.1:70b that I've been trying out. Then you hit the big ones with hundreds of billions of parameters, which won't fit on a single local machine... unless you go down the BitNet route.

BitNet takes all that huge amount of data which has decimal values defined as 16 and 32 bit floating point numbers, and simplifies it. AI engineer Rohan Paul explains it like this:

🧠 Let's break down this quantization process in BitNet b1.58 in simple terms:

1️⃣ Starting point:
We have a regular neural network with weights that can be any decimal number.

2️⃣ Goal:
We want to convert all these weights to only three possible values: -1, 0, or +1.

3️⃣ The… pic.twitter.com/jwHNoREwsm
— Rohan Paul (@rohanpaul_ai) October 20, 2024

Instead of doing matrix maths on 16-bit floating point decimal values like 0.1365 and 32-bit ones like 0.63680464 it's just -1, 0 and +1.

The last part of Paul's explanation: "think of it like simplifying a detailed colour image into just black, white and grey; you lose some detail, but the main features are still there, and it becomes much easier to process," sums it up.

If BitNet and similar implementations take off (more work is needed), things could get even more interesting with AI with powerful and less energy-intensive models literally at people's fingertips.

We welcome your comments below. If you are not already registered, please register to comment.

Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.