# Overview

The Native Emotions Library is a portable C++ library for real-time facial emotion tracking and analysis.

The SDK provides wrappers in the following languages:

* C++ (native)
* C
* Python
* C# / .NET
* Java (Android)

## Getting Started

### Hardware requirements

The SDK doesn't have any special hardware requirement:

- **CPU:** No special requirement, any modern 64 bit capable CPU (x86-64 with AVX, ARM8) is supported
- **GPU:** No special requirement
- **RAM:** 2 GB of available RAM required
- **Camera:** No special requirement, minimum resolution: 640x480

### Software requirements

The SDK is regularly tested on the following Operating Systems:

- Windows 10+
- Ubuntu 24.04+
- macOS 15+
- iOS 18+
- Android 23+

### 3rd Party Licenses

While the SDK is released under a proprietary license, the following Open-Source projects were used in it with their respective licenses:

- OpenCV - [3 clause BSD](https://opencv.org/license/)
- Tensorflow - [Apache License 2.0](https://github.com/tensorflow/tensorflow/blob/master/LICENSE)
- Protobuf - [3 clause BSD](https://github.com/protocolbuffers/protobuf/blob/master/LICENSE)
- zlib - [zlib license](https://www.zlib.net/zlib_license.html)
- minizip-ng - [zlib license](https://github.com/zlib-ng/minizip-ng/blob/master/LICENSE)
- stlab - [Boost Software License 1.0](https://github.com/stlab/libraries/blob/main/LICENSE)
- pybind11 - [3 clause BSD](https://github.com/pybind/pybind11/blob/master/LICENSE)
- fmtlib - [MIT License](https://github.com/fmtlib/fmt/blob/master/LICENSE.rst)

### Installation

#### C++

Extract the SDK contents, include the headers from the `include` folder and link `libNativeEmotionsLibrary` to your C++ project.

#### C

Extract the SDK contents, include `tracker_c.h` from the `include` folder and link `libNativeEmotionsLibrary` to your C project.

#### Python

The python version of the SDK can be installed with pip:

```bash
$ pip install realeyes.emotion-detection
```

#### C# / .NET

The .NET version of the SDK can be installed via NuGet:

```bash
$ dotnet add package Realeyes.EmotionTracking
```

#### Java

For Android projects, add the library to your `build.gradle` dependencies.

## Usage

### C++

The main entry point of this library is the `nel::Tracker` class.

After a **tracker** object is constructed, the user can call the `nel::Tracker::track()` function to process
a frame from a video or other frame source.

The `nel::Tracker::track()` function has two versions, both are non-blocking async calls: one returns
`std::future<ResultType>`, the other accepts a callback that will be called on completion. After one call,
a subsequent call is possible without waiting for the result.

For the frame data, the user must construct a `nel::ImageHeader` object. The frame data must outlive
this object since it is a non-owning view, but it only needs to be valid during the `nel::Tracker::track()`
call - the library will copy the frame data internally.

The following example shows the basic usage of the library using OpenCV for loading images and feeding them to the tracker:

```cpp
#include "tracker.h"

#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/videoio.hpp>

#include <iostream>

int main()
{
    nel::Tracker tracker("model/model.realZ");

    cv::VideoCapture video("video.mp4");
    cv::Mat frame;

    while (video.read(frame)) {
        nel::ImageHeader header{
            frame.ptr(),
            frame.cols,
            frame.rows,
            static_cast<int>(frame.step1()),
            nel::ImageFormat::BGR
        };
        int64_t timestamp_in_ms = video.get(cv2::CAP_PROP_POS_MSEC);

        // Track asynchronously using std::future
        auto future = tracker.track(header, std::chrono::milliseconds(timestamp_in_ms));
        auto result = future.get();

        // Process results
        std::cout << "Face tracking: " << (result.landmarks.isGood ? "good" : "failed") << std::endl;
        for (const auto& emotion : result.emotions) {
            std::cout << "  Probability: " << emotion.probability
                      << " Active: " << emotion.isActive << std::endl;
        }
    }
    return 0;
}
```

### C

The main entry point is the `NELTracker` opaque pointer type with associated functions.

After creating a tracker with `nel_tracker_new()`, you can track frames by calling `nel_tracker_track()`
with a callback function. The callback will be called asynchronously when tracking completes.

The following example shows basic usage:

```c
#include "tracker_c.h"
#include <stdio.h>
#include <stdlib.h>

void track_callback(void* user_data, NELResultType* result, const char* error_msg) {
    if (error_msg != NULL) {
        printf("Error: %s\n", error_msg);
        return;
    }

    printf("Face tracking: %s\n", result->landmarks->isGood ? "good" : "failed");
    for (int i = 0; i < result->emotions->count; i++) {
        printf("  Emotion %d - Probability: %f, Active: %d\n",
               result->emotions->emotions[i].emotionID,
               result->emotions->emotions[i].probability,
               result->emotions->emotions[i].isActive);
    }
}

int main() {
    char* error_msg = NULL;
    NELTracker* tracker = nel_tracker_new("model/model.realZ", 0, &error_msg);
    if (tracker == NULL) {
        printf("Failed to load model: %s\n", error_msg);
        free(error_msg);
        return 1;
    }

    // Prepare image data (example with dummy data)
    uint8_t image_data[640 * 480 * 3];  // RGB image
    NELImageHeader header = {
        .data = image_data,
        .width = 640,
        .height = 480,
        .stride = 640 * 3,
        .format = NELImageFormatRGB
    };

    nel_tracker_track(tracker, &header, 0, track_callback, NULL);

    // Clean up
    nel_tracker_free(tracker);
    return 0;
}
```

### Python

The main entry point of this library is the `realeyes.emotion_detection.Tracker` class.

After a **tracker** object is constructed, the user can call the `realeyes.emotion_detection.Tracker.track()`
function to process frames from a video or other frame source.

The following example shows the basic usage of the library using OpenCV for loading images:

```python
import realeyes.emotion_detection as nel
import cv2

# Initialize the tracker
tracker = nel.Tracker('model/model.realZ')

# Open video
video = cv2.VideoCapture('video.mp4')

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Convert BGR to RGB (OpenCV uses BGR)
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Track emotions (timestamp in milliseconds)
    result = tracker.track(frame_rgb, 0)

    # Process results
    print(f"Face tracking: {'good' if result.landmarks.is_good else 'failed'}")
    for emotion in result.emotions:
        print(f"  Emotion ID {emotion.emotion_id}: "
              f"Probability={emotion.probability:.3f}, "
              f"Active={emotion.is_active}")

video.release()
```

### C# / .NET

The main entry point is the `EmotionTracker` class.

After an **tracker** object is constructed, you can call the `TrackAsync()` method to track faces
in a frame. The method returns a `Task<TrackingResult>` allowing for asynchronous, non-blocking operation.

Both the constructor and tracking method support concurrent execution - you can start multiple operations
in parallel without waiting for results.

The following example demonstrates processing a video frame:

```csharp
using Realeyes.EmotionTracking;
using System;
using System.Threading.Tasks;

class Program
{
    static async Task Main(string[] args)
    {
        // Create tracker with model file
        using var tracker = new EmotionTracker("model/model.realZ");

        // Prepare image data (example with dummy RGB data)
        byte[] imageData = new byte[640 * 480 * 3];
        var imageHeader = new ImageHeader
        {
            Data = imageData,
            Width = 640,
            Height = 480,
            Stride = 640 * 3,
            Format = ImageFormat.RGB
        };

        // Track emotions asynchronously
        var result = await tracker.TrackAsync(imageHeader, TimeSpan.Zero);

        // Process results
        Console.WriteLine($"Face tracking: {(result.LandmarkData?.IsGood ?? false ? "good" : "failed")}");

        if (result.Emotions.Happy is { } happy)
            Console.WriteLine($"Happy: {happy.Probability:P2}, Active: {happy.IsActive}");

        if (result.Emotions.Confusion is { } confusion)
            Console.WriteLine($"Confusion: {confusion.Probability:P2}, Active: {confusion.IsActive}");
    }
}
```

### Java

The main entry point is the `Tracker` interface.

After creating a **tracker** object, you can call the `track()` method to process frames.
The method returns a `TrackerResultFuture` for asynchronous result retrieval.

The following example shows basic usage:

```java
import com.realeyesit.nel.*;

public class Example {
    public static void main(String[] args) {
        // Create tracker with model file
        Tracker tracker = Emotion.createTracker("model/model.realZ", 0);

        // Prepare image data (example with dummy RGB data)
        byte[] imageData = new byte[640 * 480 * 3];
        ImageHeader header = new ImageHeader();
        header.setData(imageData);
        header.setWidth(640);
        header.setHeight(480);
        header.setStride(640 * 3);
        header.setFormat(ImageFormat.RGB);

        // Track emotions asynchronously
        TrackerResultFuture future = tracker.track(header, 0);
        ResultType result = future.get();

        // Process results
        System.out.println("Face tracking: " +
            (result.getLandmarks().getIsGood() ? "good" : "failed"));

        for (EmotionData emotion : result.getEmotions()) {
            System.out.println("  Emotion: " + emotion.getEmotionID() +
                " Probability: " + emotion.getProbability() +
                " Active: " + emotion.getIsActive());
        }
    }
}
```

## Results

The result of the tracking contains a `nel::LandmarkData` structure and a `nel::EmotionResults` vector.

- The `nel::LandmarkData` consists of the following members:
   - **scale**, the size of the face (larger means closer the user to the camera)
   - **roll**, **pitch**, **yaw**, the 3 Euler angles of the face pose
   - **translate**, the position of the head center on the frame
   - the **landmarks2d** vector with either 0 or 49 points,
   - the **landmarks3d** vector with either 0 or 49 points,
   - and the **isGood** boolean value.

   The **isGood** indicates whether the tracking is deemed good enough.

   **landmarks2d** and **landmarks3d** contain 0 points if the tracker failed to find a face on the image, otherwise it always contain 49 points in the following structure:

   ![landmarks](landmarks.png)

   **landmarks3d** contains the 3d coordinates of the frontal face in 3D space with 0 translation and 1 scale.
- The `nel::EmotionResults` contains multiple `nel::EmotionData` elements with the following members:
   - **probability**, probability of the emotion
   - **isActive**, whether the probability is higher than an internal threshold
   - **isDetectionSuccessful** whether the tracking quality was good enough to reliable detect this emotion

   The order of the `nel::EmotionData` elements are the same as the emotions in `nel::Tracker::get_emotion_IDs()` and in `nel::Tracker::get_emotion_names()`.

## Interpretation of the classifier output

The **probability** output of the Realeyes classifier (from the `nel::EmotionData` structure) has the following properties:

- It is a continuous value from the [0,1] range
- It changes depending on type and number of facial features activated
- It typically indicates facial activity in regions of face that correspond to a given facial expression
- Strong facial wrinkles or shadows can amplify the classifier sensitivity to corresponding facial regions
- It is purposefully sensitive as the classifier is trained to capture slight expressions
- It **should not be interpreted as intensity** of a given facial expression
- It is not possible to prescribe which facial features correspond to what output levels due to the nature of the used ML models

We recommend the following interpretation of the **probability** output:

- **values close to 0**
   - no or very little activity on the face with respect to a given facial expression
- **values between 0 and binary threshold**
   - some facial activity was perceived, though in the view of the classifier it does not amount to a basic facial expression
- **values just below binary threshold**
   - high facial activity was perceived, which under some circumstances may be interpreted as true basic facial expression, while under others not (e.g. watching ads vs. playing games)
- **values above binary threshold**
   - high facial activity was perceived, which in view of the classifier amount to a basic facial expression