Granite Compiled Inference via llama.cpp

Introduction

Granite Compiled Inference via llama.cpp enables efficient, compiled inference of GGUF models using the powerful llama.cpp library.

Project Overview

Developed as part of the Quantum Proximity Gateway (QPG) project at UCL, this solution integrates facial recognition, proximity detection, post-quantum encryption, and AI-powered device personalisation to deliver a secure, accessible, and intelligent user experience.

Prerequisites

CMake
C++ Compiler (g++, clang)
GGUF model file (e.g., Granite 3.2 8B Instruct)

Installation and Setup

Clone this repository.

Clone llama.cpp into the same directory:

git clone https://github.com/ggml-org/llama.cpp

Build llama.cpp:

cd llama.cpp
  cmake -B build
  cmake --build build --config Release

Compile the inference program:

clang++ -std=c++11 -I./llama.cpp/include -I./llama.cpp/ggml/include main.cpp ./llama.cpp/build/bin/libllama.dylib -o gguf_infer -pthread -Wl,-rpath,./llama.cpp/build/bin

Run the program:
```
./gguf_infer <model-path.gguf>
```

Why Quantum Proximity Gateway?

QPG eliminates the need for manual logins, offering seamless, intelligent, and secure device access using facial recognition and proximity detection, underpinned by post-quantum encryption.