PlantCLEF · 2026Submission recap7th place · private F1

PlantCLEF 2026
multi species identification in 1m² quadrats

Submission to the 7th LifeCLEF plant identification challenge. Best public Kaggle F1 0.41826, private F1 0.40283, good for 7th place on the private leaderboard.

LUCAS test quadrats · 1m²Sample of the 2,105 image evaluation set
LUCAS quadrat CBN PdlC A1
01CBN PdlC A1
LUCAS quadrat CBN Pla A1
02CBN Pla A1
LUCAS quadrat GUARDEN 10
03GUARDEN 10
LUCAS quadrat LISAH JAS
04LISAH JAS
LUCAS quadrat OPTMix 0108
05OPTMix 0108
LUCAS quadrat RNNB 1 1
06RNNB 1 1
Team
Arjun Raj·Manindra de Mel·Razeen Wasif·Will Brake

Pipeline

End to end animation

Quadrat in, species set out. The same i002 BioCLIP 2.5 checkpoint runs at 224 and 336 pixels over a 4 by 4 tile grid, the per tile softmax distributions are averaged, class prior logit adjustment is applied, and an adaptive probability threshold emits the final species set.

7,806vasc.
Species in catalogue
2.65Mi001
Single plant train images
2,105
Test quadrats (LUCAS)
0.41826ours
Best public F1, private 0.40283

Overview

7th edition
LifeCLEF 2026
September 2026

We train on isolated single plant photographs and predict every species visible in a 1m² in situ vegetation quadrat. Training and test data live in different visual worlds.

Our pipeline is a single fine tuned BioCLIP 2.5 ViT H/14 with only the last 4 of 32 transformer blocks unfrozen, per task MLP heads at species, genus and family, and a 4 by 4 tile inference at two resolutions. More clean training data without a per species cap (the 2.65 million image i001 manifest combining PlantCLEF 2024 with research grade iNaturalist) is what gave the best result; an explicit cap and tail augmentation in i003 lowered public F1.

Across more than 20 experiments and 600+ Kaggle submissions, the bottleneck kept being the single plant to multi species distribution shift, not model capacity. Held out single plant validation accuracy correlates with quadrat F1 at only r ≈ 0.4. Best public F1 of 0.41826, private F1 of 0.40283, from a 224 plus 336 px ensemble of our i002 checkpoint with class prior logit adjustment (τ = 0.25) and an adaptive probability threshold (T = 0.03, k ∈ [2, 10]).

The task

Task overview

PlantCLEF 2026 is a single multi label track over 7,806 species. Training images are single plant close ups; test images are multi species LUCAS quadrats.

Multi label species ID in 1m² quadrats

Given a top down photograph of a LUCAS vegetation plot, predict every vascular plant species visible. Plots contain 1 to 10 species at varying scale.

Models train on isolated single plant photos and then compose those representations into a multi label prediction on plots where species co occur and occlude each other. The 7,806 species axis is fixed.

Input1 RGB image, 800px max side
OutputAdaptive species set, k ∈ [2, 10]
Train2,653,781 single plant photos (i001)
Test2,105 LUCAS quadrats
MetricMacro F1 per quadrat
Our public0.41826
Our private0.40283

Datasets

Data sources

All splits share the canonical 7,806 species axis. Taxonomy joins on gbif_species_id.

01 · TRAIN

PlantCLEF 2024 single plant

Pl@ntNet close up photographs across 7,806 species. The original supervised training set.

1,408,033 images·PC24
02 · TRAIN ADD

iNaturalist research grade

Research grade iNaturalist images, de duplicated and image verified, added to the manifest during the i001 data rebuild.

1,245,748 images·iNat
03 · TRAIN UNION

i001 manifest (deployed)

What the deployed i002 model trains on: stratified 90/10 by species (seed 42), species with under 5 images held entirely in train.

2,653,781 images·i001
04 · TEST

LUCAS vegetation plots

In situ 1m² quadrats with 1 to 10 species each, expert annotated. The distribution we transfer to.

2,105 plots·LUCAS
05 · TAXONOMY

GBIF taxonomic backbone

7,806 species into 1,446 genera and 181 families. Drives the auxiliary taxonomy heads at training time.

7,806 / 1,446 / 181·GBIF

Results

Figures

Scores pulled from the Kaggle public API for plantclef-2026 and merged with our local submission logs (240 submissions across 18 experiments).

Figure 1

Score progression across experiments

Each dot is the best public Kaggle F1 from one experiment. The dashed line tracks best so far. 010 is the partial unfreeze breakthrough; i002 is long tail capping with adaptive selection.

Figure 2

Val and Kaggle move in opposite directions

Holding every other knob fixed, sweeping the number of unfrozen transformer blocks. Val top 5 rises monotonically; Kaggle F1 peaks sharply at n=4. The bottleneck is train versus test distribution, not capacity.

Figure 3

Validation accuracy is a poor Kaggle predictor

Each dot is one training run. Validation top 1 (x) and Kaggle public F1 (y) are only weakly correlated, Pearson r = 0.41. Within the 014b sweep the relationship inverts: n=5 has the best val and the worst Kaggle.

Figure 4

015 validation climbs while Kaggle stalls

Every taxonomic level rises smoothly across the 5 epoch PC24, iNat run. Yet Kaggle public F1 went from 0.37506 at ep1 to 0.37956 at ep5, still below the 0.38333 010 anchor without iNat. Adding capacity to the wrong distribution.

Figure 5

Species image counts, before and after the 500 cap

PC24 is heavily right skewed. 2,263 species have 501 to 1,000 training images while 145 have a single image. The 500 cap from i003 collapses the right tail but leaves 1,644 species with under 100 images unchanged.

Submissions

Per experiment ranking

Ranked by public F1. Click any column to re sort. Highest public was 0.41826; highest private was 0.40600.

#ExpRecipePublic F1Private F1
01i002Cap, extraensemble 224 + 336 px, LA tau 0.25, probT 0.03best public0.418260.40283
02i002Cap, extra224 + 336 + GBIF DOY phenology Boltzmann prior0.413460.38766
03i002Cap, extra224 + 336, LA tau 0.25, genus rerank, probT 0.0350.412460.39946
04i002Cap, extralast_blocks 4, softmax mean, probT 0.050.411650.38132
05i002Cap, extralast_blocks 336 px (pos_embed bicubic resize)0.411170.38376

Resources

References

Code repository, method paper, and per experiment notes.

CODE

Code repository

Training, inference, and analysis code for the ARM Wision submission, covering the BioCLIP 2.5 partial fine tune, the tiled inference pipeline, and the full experiment ladder.

GitHub·arm-wision/PlantCLEF2026
FORTHCOMING

Method paper

Partial unfreeze sweet spot, long tail capping, validation versus leaderboard decoupling, and ablations across the experiment ladder. Will be linked here when published.

CEUR WS·In preparation
REFERENCE

Experiment ladder

Per experiment summaries from 001 through i003 with recipe, validation metrics, Kaggle public and private scores, and the lessons that carried forward.

28 directions·600+ submissions
REFERENCE

Data manifest

The deployed i002 model trains on the 2.65 million image i001 manifest combining PlantCLEF 2024 with a research grade iNaturalist pull, stratified 90/10 by species with seed 42.

2,653,781 images·7,806 species

Team

Authors

ANU Deep Learning, Semester 1 2026.

Razeen Wasifu7283652
Will Brakeu7480294

Cite this work

BibTeX.

If you build on this work, please also cite the PlantCLEF 2026 overview paper and the Pl@ntNet observation pipeline.

@inproceedings{armwision2026plantclef, title = {ARM Wision at PlantCLEF 2026: Our Submission to the Multi Species Vegetation Plot Challenge}, author = {Raj, Arjun and de Mel, Manindra and Wasif, Razeen and Brake, Will}, booktitle = {Working Notes of CLEF 2026}, series = {CEUR Workshop Proceedings}, year = {2026} }