Models I used:

    • RF-DETR – a DETR-style real-time object detector. We fine-tuned it to detect players, jersey numbers, referees, the ball, and even shot types.
    • SAM2 – a segmentation and tracking. It re-identifies players after occlusions and keeps IDs stable through contact plays.
    • SigLIP + UMAP + K-means – vision-language embeddings plus unsupervised clustering. This separates players into teams using uniform colors and textures, without manual labels.
    • SmolVLM2 – a compact vision-language model originally trained on OCR. After fine-tuning on NBA jersey crops, it jumped from 56% to 86% accuracy.
    • ResNet-32 – a classic CNN fine-tuned for jersey number classification. It reached 93% test accuracy, outperforming the fine-tuned SmolVLM2.



    by RandomForests92

    1 Comment

    Leave A Reply