Visualização de leitura

How Wildlife Traffickers Are Using Coded Language to Sell Protected Animals On Facebook

A Bellingcat investigation has identified nine Facebook groups with a combined membership of more than 70,000 people, in which coded language has helped illegal wildlife dealers evade bans on the platform for years. Facebook says it prohibits any form of animal trading on its platform.

Investigating the operators behind all nine groups, Bellingcat identified six Facebook profiles that led back to a single broker in Jakarta, Indonesia. This investigation was carried out in partnership with Mongabay. You can read their report in English here and in Bahasa Indonesia here.

In an open Facebook group, brazenly titled “West Bogor Animal Selling and Trading Forum,” one member posts an advert for a vulnerable rhinoceros hornbill.

Screenshots of an online advertisement for a rhinoceros hornbill chick, a protected and vulnerable species, posted on Facebook on July 11, 2025.

Commenting on the advert, another member warns: “Just be careful not to get caught.” 

Screenshot of a Facebook conversation, translated from Bahasa Indonesia and posted in July 2025. Annotated by Bellingcat.

“That’s the risk,” replies the seller. 

Under Indonesian law, the capture, trade, or possession of a rhinoceros hornbill is punishable by up to five years’ imprisonment or a fine of up to Rp100 million (US$6,000). (According to Statistics Indonesia, the average monthly wage in August 2025 was just over Rp3 million or US$180.)

Meta also states that the buying and selling of animals on its platforms is prohibited. However, in this group, along with eight others identified by Bellingcat, animals have been traded in plain sight for years, including wild and protected species. Three of the nine groups have been live on Facebook for at least five years. Four have been active for 12 months or more, and the remaining two were created in 2025.

Screenshots of tortoises, monkeys, and owls for sale, posted in Facebook adverts in October 2025.

All nine groups state in their “About” tab that they are based in or around Jakarta, the Indonesian capital. As one of the most biodiverse countries in the world, Indonesia is a hotspot for poachers and a key transit hub in the illegal wildlife trade.

A quick scan of these groups revealed a variety of protected species for sale, including Javan coucals, Javan scops owls, Javan langurs, binturongs, and both wreathed and rhinoceros hornbills.

In one of the most active groups, West Bogor Animal Selling and Trading Forum, more than 200 adverts were posted in a single week. Of these, 18 advertised vulnerable species, including these two infant silvery gibbons. 

Screenshots of two infant silvery gibbons advertised on Facebook on May 10, 2025.

With fewer than 2,500 mature individuals left in the wild, the silvery gibbon is considered endangered. Under Indonesian law, trading in this species can result in up to five years’ imprisonment or a fine of up to Rp 100 million (US$6,000).

Otters were also frequently posted in the group. Popular in the Southeast Asian pet trade, most otter species are protected due to declining numbers in the wild. However, because many of the adverts were for infants, it was not always possible to determine which otter species was being sold, and therefore whether it was protected.

“Using Codes So The Group Stays Safe”

Despite Facebook’s total ban on animal trading, including pets, in the group titled: Civet/Pet Buying and Selling in the Greater Jakarta Area, members were instructed in the “About” tab to “prioritise using codes so the group stays safe from being banned.”

Screenshot of the group’s About description. Translated and annotated by Bellingcat.

Alphanumeric codes were used to discuss animal prices in eight of the nine groups identified by Bellingcat. According to the Indonesian news outlet Jateng Today, the use of pricing codes, intended to circumvent Facebook’s automated moderation systems, is not uncommon among animal traders on the platform.

Such codes use the letters A, B, and C to denote different Indonesian rupiah denominations. A stands for a Rp100,000 note (about US$6), while B represents a Rp50,000 note (about US$3). An accompanying number specifies the quantity, so A3 indicates three Rp100,000 notes.

Screenshot of a conversation on Facebook discussing the price of animals. Blurring by Bellingcat.

In the post below, one member asks, “A2 dapet apa?” – “What does A2 (Rp 200,000; US$12) get you?”

Screenshot from the Facebook group ‘Buying and Selling civets/pets in the Greater Jakarta area,’ posted on Facebook, August 6, 2024.

The post received 69 replies, with members offering everything from otters to owls, civets and geckos.

The term “Wc” – a common shorthand in animal trading groups for “wild-caught” – was also frequently used across all nine groups. Under Indonesian law, even if a species is not listed as vulnerable or protected, capturing and selling wild animals without a permit is illegal.

Related articles by Bellingcat

The Hunt for Malaysia’s Elusive Wildlife Trafficker
Asia-Pacific

The Hunt for Malaysia’s Elusive Wildlife Trafficker

Asked whether its moderation systems could detect cost codes (as text or embedded in images) or key terms such as WC (when found next to images of animals), Meta responded: 

“Bad actors constantly evolve their tactics to avoid enforcement, which is why we partner with groups like the World Wildlife Fund and invest in tools and technology to detect and remove violating content.”

The Operators

While investigating the operators behind all nine groups, Bellingcat identified six Facebook profiles that led back to one individual broker based in Jakarta. 

By navigating to the “People” tab in one of the groups, a list of admins and moderators appears, including an account referenced below as AB. Despite AB’s profile being locked, a search with the term “wa.” (WhatsApp’s click-to-chat feature) returned dozens of animal adverts alongside a phone number.

Screenshot of AB’s Facebook post including a phone number. Posted June 11, 2025.

Using the phone number to search for AB’s historic posts, six out of the nine groups under investigation were found to have adverts for vulnerable species, including this advert for a binturong

Screenshot of an advert for a “Bintu” short for binturong. Posted by AB, September 2024. 

Listed as vulnerable by the International Union for Conservation of Nature (IUCN), keeping a binturong, let alone trading it commercially, is prohibited under Indonesian law.

AB has also advertised this “Celepuk Wc”, a wild-caught scops owl, seen below. Although the species itself is not protected, selling a wild-caught owl in Indonesia without a permit (which are tightly regulated) violates Indonesian law.

Owls for sale, posted by AB. Left: Labelled “Wc” for wild-caught. Right: “BC” for bred in captivity. 

By following the phone number shared by AB, five more Facebook profiles were uncovered. The six profiles frequently shared similar adverts, often within days of each other, for the same species, sometimes featuring a similar interior background, and always listing the same telephone number.

Six different accounts posting similar-looking animal adverts, while all using the same contact phone number.

Late last year, one of the accounts referenced below as W, posted this wreathed hornbill, a protected species in Indonesia. 

Screenshot of an advert for a wreathed hornbill. Posted by Waa, November 2025. 

Of the six profiles, only one, named Azie Soka Smithh has ever posted personal data, including a profile picture of a man with a child. 

An advert for a civet, posted by Azie Soka Smithh and tagging the same phone number as used by the other five accounts.

Further investigation into Azie Soka Smithh confirmed their presence on other platforms, including Telegram and Instagram. However, their full legal name remained unknown. While searching for visual clues to their location, it became apparent that the vast majority of images had been tightly cropped, revealing little about their whereabouts – except for a handful of images that appeared to have been taken at the same location: a pet shop.

In the adverts shown below, a poster can be seen on the wall behind the cage displaying the shop name Station Sato Exotic and a phone number. Of all the images seemingly taken in the same shop, none featured species protected under Indonesian law. However, the long-tailed macaque shown below is considered endangered according to IUCN due to declining numbers in the wild. 

Adverts posted by two different accounts but with the same shop name and phone number visible in the background. The right image features a long-tailed macaque.

A Google search for the shop’s name and number returned a Google Maps listing for Station Sato Exotic. A man named “beni” had left a five-star rating as well as several dozen photos and videos of the pet shop’s interior, including one that appeared to show a man sitting next to an identical poster as seen in the animal adverts. 

Screenshot of Beni’s Google review, including (right) a video of a man sitting beside a poster for Station Sato Exotic. Posted July 2021.

According to beni’s Google account, his full name is Beni Abdul Hamid (translated from Arabic). His bio reads: “We sell various kinds of accessories, cages, animal feed, etc” (translated from Bahasa Indonesia).

Of the 16 photos and 25 videos posted by Beni, several showed a left hand holding animals up to the camera, with a distinctive mole visible on the wrist. A seemingly identical mole appeared in several of the adverts posted by the six Facebook accounts sharing the same phone number. Notably, the mole and wrist were not seen holding species protected under Indonesian law. However, the long-tailed macaque shown below is considered endangered according to IUCN.

A distinctive mole appears in multiple animal adverts posted by (left) Beni on Google Listings, (centre) AB on Facebook and (right) another of the six accounts using the shared phone number. The centre and right images feature a long-tailed macaque.

Upon visiting Station Sato Exotic, our partners at Mongabay confirmed that Google reviewer Beni Abdul Hamid was in fact the owner. His son, Jordan Bastian, who was present on the day, told their reporter he now manages the shop on his father’s behalf.

Bastian confirmed that it was his wrist and mole in the adverts and that he had taken all of the photos inside the shop. However, he said he was not behind any of the six Facebook accounts and that they were most likely run by a local broker. He explained that his business relies on a network of brokers operating on Facebook and WhatsApp. He sends them photos of the animals he has for sale, and they handle sourcing and organising everything with the buyer in exchange for a cut of the profits.

“I’m a broker. I’m involved in marketing the animals, so I provide the photos,” said Bastian. “I don’t want to know about the buyer.”

When shown the Facebook account for Azie Soka Smithh, Bastian confirmed that the man in the profile picture was a local broker, but one who seldom visited the shop.

Station Sato Exotic Pet Shop also has an online presence on Tokopedia, a major Indonesian marketplace. The platform’s guidelines prohibit the sale of endangered species, but are not clear regarding the sale of other animals, including pets.

Of Station Sato Exotic’s 71 current listings, the large majority have been miscategorised. Animals are listed as tools, toys, aquarium decorations and books. They are also miscategorised as other species; for example, birds and squirrels have been listed as hamsters or reptiles.

One advert features a vulnerable cuckoo species, the Sunda Coucal. Endemic to Java and numbering fewer than 10,000, this bird has been listed as vulnerable since 1994.

Screenshot of Station Sato Exotic’s Tokopedia page promoting the sale of a vulnerable cuckoo species. The page reports that four birds have already been sold. 

Asked whether he had sold many animals via Tokopedia, Bastian said his account had been blocked after he was banned for selling squirrels. When shown the advert above for the Sunda Coucal, he said he was surprised to learn it was classified as vulnerable. Tokopedia did not respond to requests for comment regarding an advert for a vulnerable species appearing on their platform. 

On the sale of protected or vulnerable species more broadly, Bastian admitted he had in the past, but has since stopped, describing “the risk is big” and saying he prefers to “play it safe.” 

After contacting the local authorities for comment, three officers from the West Java Natural Resources Conservation Agency (BBKSDA) made a surprise visit to Station Sato Exotic, due to the shop having previously been reported for selling protected species. Head of Conservation Stephanus Hanny said that upon arrival, “We went inside and checked every animal… We did not find any protected species.” He added that even the sale of non-protected wildlife requires a permit, which the shop does not currently hold. However, since it’s not a criminal offence, Hanny said they could only issue the owners with a warning. 

Bellingcat also contacted the phone number associated with Azie Soka Smithh. The person replied, confirming they managed all six accounts but denied selling any animals, including protected and vulnerable species. “I’m just a hobbyist. An animal lover,” they said. 

Support Bellingcat

Your donations directly contribute to our ability to publish groundbreaking investigations and uncover wrongdoing around the world.

Given that the account had been found advertising vulnerable and protected species for sale, the Indonesian Director General of Forestry Law Enforcement, Dwi Januanto Nugroho, said authorities would investigate. Asked how their team of investigators was adapting to the illegal wildlife trade growing online, Nugroho replied:

“Criminal behaviour continues to reproduce itself in order to survive. In fact, it can evolve faster than the law enforcement system itself. In response …cyber patrols and desk analysis via the operations room will continue to be intensified, while we further optimise support from volunteer networks, working partners, and public participation.”

After contacting Meta, all six accounts, including Azie Soka Smithh, and all nine groups, totalling 70,000 members, were shut down. Meta confirmed: “We removed the Facebook groups and profiles in question for violating our Restricted Goods and Services Policy.”

Merel Zoet and Claire Press contributed to this report.

Bellingcat is a non-profit and the ability to carry out our work depends on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Bluesky here and Mastodon here.

The post How Wildlife Traffickers Are Using Coded Language to Sell Protected Animals On Facebook appeared first on bellingcat.

AI Used to Promote Non-Existent Evacuation Flights From the Middle East

The Netherlands’ largest newspaper, De Telegraaf, recently published an interview with a woman claiming to organise her own evacuation flights from Dubai, selling seats at €1,600 (US$ 1850) each. Four days later, her photo was removed from the article, though the interview remained.

Bellingcat has found that the original image not only includes artefacts commonly associated with generative AI, but that the flights referenced in the article do not appear to exist.

Subscribe to the Bellingcat newsletter

Subscribe to our newsletter for first access to our published content and events that our staff and contributors are involved with, including interviews and training workshops.

The story came at a time when thousands of Dutch people were reportedly seeking urgent ways to leave the region following Iranian missile and drone strikes across the Gulf in retaliation for US-Israeli strikes.

Published on De Telegraaf’s website on March 5, the headline reads: “Dutch people in the Middle East feel abandoned by the government: We just rented a plane ourselves.”

The Dutch minister of foreign affairs was confronted with this headline during a television interview, in which he described ongoing efforts by the Dutch government to repatriate citizens to the Netherlands.

The article features interviews with several Dutch people struggling to leave Dubai and Abu Dhabi, including Tamara Harema. Under the subheading “Dutch people hire their own plane”, Harema says she was “rebooked five times by Emirates” and that the official repatriation flights organised by the Dutch government were not ‘taking off’.

As part of a group, she says, they are organising buses and have hired an Airbus A321 to fly home. Harema is quoted as saying: “The first plane is already full, so we’re organising a second flight. Stranded travellers can contact us.”

However, several discrepancies in Harema’s photo, published in the original article, suggest it was AI-generated. No trace of a person matching Harema’s face or profile could be found, and flight-tracking data suggests no such plane took off.

The Photo

In the image below, the world’s tallest structure, Burj Khalifa, can be seen through the window overlooking the Dubai skyline. Each side of the tower is unique, with platforms that protrude at different heights and in different directions. It also contains several mechanical floors, which appear as dark bands in the photo.

Photo description as published by De Telegraaf reads: “Tamara Harema and a group organise their own flights to the Netherlands, for which they have rented an Airbus A321. “Otherwise, nothing would get off the ground.” © Own photo” Source: Published in De Telegraaf, March 5.

By cross-checking the height of the visible platforms together with the location of the mechanical floors, it’s possible to determine that Harema’s hotel room faces north-west, towards the Burj Khalifa’s south-east-facing facade.

Comparing Harema’s photo (bottom left) to all three sides of Burj Khalifa’s base suggests she is looking at the Southeast facade. Source: Harema’s image / Google Street View.

Several discrepancies are visible when comparing Harema’s photo with other images of the building, including an upper mechanical floor appearing higher than in other images and the absence of the water feature at the base of the building.

Harema’s image (left), compared to a screenshot of a video of the building from 2020 (right), suggests a discrepancy between the upper mechanical floors. The water feature is also absent. Source: Harema’s image / Youtube.

To establish whether Harema’s photo could have been taken several years earlier, Google Street View imagery was analysed from 2013 onwards. No match could be found when comparing the arrangement of buildings at the base of the Burj Khalifa.

In Harema’s photo, the arrangement of buildings at the base of the tower does not match historic Google Street View images. Source Harema’s image/ Google Street View.

Several other irregularities, as shown below, including the hotel room furniture and details of Harema’s clothing and jewellery, also suggest it may have been AI-generated.

(Left) a distorted lamp stand; (top right) blurring on the “V” of her T-shirt; (bottom right) an earring that appears to merge into her face – all discrepancies commonly associated with generative AI.


Fully Booked Airbus A321

Regarding whether the plane existed, Harema says in her interview that buses have already been arranged to collect passengers from two locations in Dubai on Saturday, March 7, after which a 232-seater Airbus A321 will depart from Muscat, Oman, for the Netherlands.

The article notes the cost is €1,600 (US$ 1850) per person, without detours. “Although we read that a Dutch repatriation flight costs €600, just try getting on such a flight,” says Harema.

According to Flightradar24, multiple A321s departed Muscat on March 7 and 8, but none bound for the Netherlands. The only aircraft that did arrive in Amsterdam from Muscat were either government-organised repatriation flights or scheduled Oman Air services, none of which were Airbus A321s.

Two Airbus A321s were recorded on the ground at Muscat Airport on March 7. One, belonging to Gulf Air, later departed for Rome via Riyadh March 8. The other, operated by SalamAir, had been flying routes between Oman and Bangladesh until March 3, but has since remained in Muscat.

Support Bellingcat

Your donations directly contribute to our ability to publish groundbreaking investigations and uncover wrongdoing around the world.

After contacting De Telegraaf, an explanation for the photo’s removal was added at the bottom of the article, stating that the photo did “likely not meet our journalistic guidelines.”

The newspaper’s deputy editor-in-chief, Joost de Haas, added:

“Regarding the quoted Tamara Harema, the editors contacted her after Mr. Chizki Loonstein—a long-standing source for one of our reporters—informed us about attempts to charter a plane. Mr Loonstein informed us that Ms Harema stayed in Dubai and could tell us more about it. This led to messages from which several quotes from Harema were extracted, as reproduced in the relevant passage of the article.”

A search for Loonstein led to a six-month-old report from another Dutch newspaper, NRC, which claimed that Loonstein, a lawyer, emigrated to Dubai after his legal company went bankrupt, leaving his clients, victims of fraud, worse off.

Contacted for comment, Loonstein confirmed that he knew Harema and had shared her contact details in “an app group” in relation to a flight from Muscat to Amsterdam. After this contact, Bellingcat sent him the photo of Harema to confirm her identity and asked him to share Harema’s contact details. In response, Loonstein refused to provide further comment. 


Merel Zoet and Claire Press contributed to this report.

Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Bluesky here, Instagram here, Reddit here and YouTube here.

The post AI Used to Promote Non-Existent Evacuation Flights From the Middle East appeared first on bellingcat.

LLMs Vs. Geolocation: GPT-5 Performs Worse Than Other AI Models

In June, Bellingcat ran 500 geolocation tests, comparing LLMs from various companies against each other, as well as Google Lens – a staple tool for finding the location of photos.

At the time, ChatGPT o4-mini-high emerged as the clear winner, with Google Lens outperforming most other models. Just two months later, with new versions of these AI tools available, we re-ran the trial – this time including Google “AI Mode,” GPT-5, GPT-5 Thinking, and Grok 4 into the mix.

These five photos were excluded from our most recent trial as they were published in our previous article.

The original test used 25 of Bellingcat’s own holiday photos. From cities to remote countryside, the images included scenes both with and without recognisable features – such as roads, signage, mountains, or architecture. Images were sourced from every continent.

For the updated trial, five test photos were excluded, as they had appeared in a previous article, thus compromising the integrity of the results.

All 24 models’ responses were ranked on a scale from 0 to 10, with 10 indicating an accurate and specific identification (such as a neighbourhood, trail, or landmark) and 0 indicating no attempt to identify the location at all.

chart visualization

Google AI Mode was shown to be the most capable geolocation tool overall. 

Grok 4 gave both better and worse answers compared to Grok 3 but, on average, scored marginally higher. However, it was still less accurate than older versions of Gemini and GPT. 

GPT-5, even in ‘Thinking’ and ‘Pro’ modes, was a considerable downgrade when compared with the capabilities demonstrated by GPT o4-mini-high. In one example, of a city street with skyscrapers in the background, o4-mini-high correctly identified the street, while GPT-5 in Thinking mode pointed to the wrong country. 

Support Bellingcat

Your donations directly contribute to our ability to publish groundbreaking investigations and uncover wrongdoing around the world.

Despite delivering faster answers, GPT-5 appeared to sacrifice accuracy. A surprising number of errors and a general sense of disappointment in the new model have also been reported by other users.

Bellingcat tested GPT-5 and its ‘Thinking’ mode via the Plus subscription, which costs roughly the same as access to 04-mini-high prior to its retirement. Five of the most difficult test images were also run through GPT-5 Pro. But even Pro, with a premium price tag of €200 per month, failed to geolocate the photos any more accurately than GPT 04-mini-high.

A Beach, a Hotel and a Ferris Wheel

The disparity between Google and the GPT models became even more apparent in Test 25 – a photo of a shoreline hotel in Noordwijk, the Netherlands, with a Ferris wheel rising just beyond the dunes.

Test 25: A photo of Noordwijk beach in the Netherlands. Credit: Bellingcat.

In the previous trial, most older models – including those from GPT, Claude, Gemini and Grok – accurately identified the country as the Netherlands but failed to locate the town. Many latched onto the Ferris wheel but pointed instead to the seaside town of Scheveningen, which also has a Ferris wheel, though situated on a pier, not among the sand dunes.

However, the most recent models, GPT-5 Pro and Thinking, were even less accurate, identifying a beach in France – an entirely different country. 

Unfortunately for open source researchers, following the release of GPT-5, OpenAI removed the option to select older models such as o4-mini-high. After a wave of negative feedback, OpenAI reinstated GPT-4o as the default model for paid subscribers. However, the most capable geolocation models identified in Bellingcat’s testing remain inaccessible.

Google AI Mode, on the other hand, was the first, and only model so far, to correctly identify Noordwijk as the location in Test 25.  

Though AI Mode is powered by a version of Gemini 2.5, it outperformed Gemini 2.5 Pro Deep Research in these tests. Described by Google as its “most powerful AI search, with more advanced reasoning and multimodality,” AI Mode geolocated test images with greater accuracy than any GPT models, including our previous winner, o4-mini-high.

AI Mode is currently only available in India, United Kingdom and the United States.

Credit: Google.

The majority of models, at some point, returned a hallucination. Users should not rely solely on the answers provided by LLMs. Even the best options, including Google AI Mode, still, at times, confidently point to the wrong location. 

The difference in models’ capabilities compared with just two months ago shows how quickly this field is evolving. However, OpenAI’s recent changes also suggest that progress is not guaranteed, and that AI’s ability to geolocate may plateau or even worsen over time. As new models emerge, Bellingcat will continue to test them.

Thanks to Nathan Patin for contributing to the original benchmark tests.


Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Bluesky here and Instagram here.

The post LLMs Vs. Geolocation: GPT-5 Performs Worse Than Other AI Models appeared first on bellingcat.

Have LLMs Finally Mastered Geolocation?

An ambiguous city street, a freshly mown field, and a parked armoured vehicle were among the example photos we chose to challenge Large Language Models (LLMs) from OpenAI, Google, Anthropic, Mistral and xAI to geolocate. 

Back in July 2023, Bellingcat analysed the geolocation performance of OpenAI and Google’s models. Both chatbots struggled to identify images and were highly prone to hallucinations. However, since then, such models have rapidly evolved. 

To assess how LLMs from OpenAI, Google, Anthropic, Mistral and xAI compare today, we ran 500 geolocation tests, with 20 models each analysing the same set of 25 images. 

We chose 25 of our own travel photos, varying in difficulty to geolocate, none of which had been published online before.

Our analysis included older and “deep research” versions of the models, to track how their geolocation capabilities have developed over time. We also included Google Lens to compare whether LLMs offer a genuine improvement over traditional reverse image search. While reverse image search tools work differently from LLMs, they remain one of the most effective ways to narrow down an image’s location when starting from scratch.

The Test

We used 25 of our own travel photos, to test a range of outdoor scenes, both rural and urban areas, with and without identifiable landmarks such as buildings, mountains, signs or roads. These images were sourced from every continent, including Antarctica. 

The vast majority have not been reproduced here, as we intend to continue using them to evaluate newer models as they are released. Publishing them here would compromise the integrity of future tests.

Each LLM was given a photo that had not been published online and contained no metadata. All models then received the same prompt: “Where was this photo taken?”, alongside the image. If an LLM asked for more information, the response was identical: “There is no supporting information. Use this photo alone.”

We tested the following models:

DeveloperModelDeveloper’s Description
AnthropicClaude Haiku 3.5“fastest model for daily tasks”
Claude Sonnet 3.7“our most intelligent model yet”
Claude Sonnet 3.7 (extended thinking)“enhanced reasoning capabilities for complex tasks”
Claude Sonnet 4.0“smart, efficient model for everyday use”
Claude Opus 4.0“powerful, large model for complex challenges”
GoogleGemini 2.0 Flash“for everyday tasks plus more features”
Gemini 2.5 Flash“uses advanced reasoning”
Gemini 2.5 Pro“best for complex tasks”
Gemini Deep Research“get in-depth answers”
MistralPixtral Large“frontier-level image understanding”
OpenAIChatGPT 4o“great for most tasks”
ChatGPT Deep Research“designed to perform in-depth, multi-step research using data on the public web”
ChatGPT 4.5“good for writing and exploring ideas”
ChatGPT o3“uses advanced reasoning”
ChatGPT o4-mini“fastest at advanced reasoning”
ChatGPT o4-mini-high“great at coding and visual reasoning”
xAIGrok 3“smartest”
Grok 3 DeepSearch“advanced search and reasoning”
Grok 3 DeeperSearch“extended search, more reasoning”

This was not a comprehensive review of all available models, partly due to the speed at which new models and versions are currently being released. For example, we did not assess DeepSeek, as it currently only extracts text from images. Note that in ChatGPT, regardless of what model you select, the “deep research” function is currently powered by a version of o4-mini

Support Bellingcat

Your donations directly contribute to our ability to publish groundbreaking investigations and uncover wrongdoing around the world.

Gemini models have been released in “preview” and “experimental” formats, as well as dated versions like “03-25” and “05-06”. To keep the comparisons manageable, we grouped these variants under their respective base models, e.g. “Gemini 2.5 Pro”. 

We also compared every test with the first 10 results from Google Lens’s “visual match” feature, to assess the difficulty of the tests and the usefulness of LLMs in solving them. 

We ranked all responses on a scale from 0 to 10, with 10 indicating an accurate and specific identification, such as a neighbourhood, trail, or landmark, and 0 indicating no attempt to identify the location at all.

And the Winner is…

ChatGPT beat Google Lens.

In our tests, ChatGPT o3, o4-mini, and o4-mini-high were the only models to outperform Google Lens in identifying the correct location, though not by a large margin. All other models were less effective when it came to geolocating our test photos.

chart visualization

We scored 20 models against 25 photos, rating each from 0 (red) to 10 (dark green) for accuracy in geolocating the images.

Even Google’s own LLM, Gemini, fared worse than Google Lens. Surprisingly, it also scored lower than xAI’s Grok, despite Grok’s well-documented tendency to hallucinate. Gemini’s Deep Research mode scored roughly the same as the three Grok models we tested, with DeeperSearch proving the most effective of xAI’s LLMs.

The highest-scoring models from Anthropic and Mistral lagged well behind their current competitors from OpenAI, Google, and xAI. In several cases, even Claude’s most advanced models identified only the continent, while others were able to narrow their responses down to specific parts of a city. The latest Claude model, Opus 4, performed at a similar level to Gemini 2.5 Pro. 

Here are some of the highlights from five of our tests.

A Road in the Japanese Mountains

The photo below was taken on the road between Takayama and Shirakawa in Japan. As well as the road and mountains, signs and buildings are also visible.

Test “snowy-highway” depicted a road near Takayama, Japan.

Gemini 2.5 Pro’s response was not useful. It mentioned Japan, but also Europe, North and South America and Asia. It replied:

“Without any clear, identifiable landmarks, distinctive signage in a recognisable language, or unique architectural styles, it’s very difficult to determine the exact country or specific location.”

In contrast, o3 identified both the architectural style and signage, responding:

“Best guess: a snowy mountain stretch of central-Honshu, Japan—somewhere in the Nagano/Toyama area. (Japanese-style houses, kanji on the billboard, and typical expressway barriers give it away.)”

A Field on the Swiss Plateau

This photo was taken near Zurich. It showed no easily recognisable features apart from the mountains in the distance. A reverse image search using Google Lens didn’t immediately lead to Zurich. Without any context, identifying the location of this photo manually could take some time. So how did the LLMs fare?

Test “field-hills” depicted a view of a field near Zurich

Gemini 2.5 Pro stated that the photo showed scenery common to many parts of the world and that it couldn’t narrow it down without additional context. 

By contrast, ChatGPT excelled at this test. o4-mini identified the “Jura foothills in northern Switzerland”, while o4-mini-high placed the scene ”between Zürich and the Jura mountains”.

These answers stood in stark contrast to those from Grok Deep Research, which, despite the visible mountains, confidently stated the photo was taken in the Netherlands. This conclusion appeared to be based on the Dutch name of the account used, “Foeke Postma”, with the model assuming the photo must have been taken there and calling it a “reasonable and well-supported inference”.

An Inner-City Alley Full of Visual Clues in Singapore

This photo of a narrow alleyway on Circular Road in Singapore provoked a wide range of responses from the LLMs and Google Lens, with scores ranging from 3 (nearby country) to 10 (correct location).

Test “dark-alley”, a photo taken of an alleyway in Singapore

The test served as a good example of how LLMs can outperform Google Lens by focusing on small details in a photo to identify the exact location. Those that answered correctly referenced the writing on the mailbox on the left in the foreground, which revealed the precise address.

While Google Lens returned results from all over Singapore and Malaysia, part of ChatGPT o4-mini’s response read: “This appears to be a classic Singapore shophouse arcade – in fact, if you look at the mailboxes on the left you can just make out the label ‘[correct address].’”  

Some of the other models noticed the mailbox but could not read the address visible in the image, falsely inferring that it pointed to other locations. Gemini 2.5 Flash responded, “The design of the mailboxes on the left, particularly the ‘G’ for Geylang, points strongly towards Singapore.” Another Gemini model, 2.5 Pro, spotted the mailbox but focused instead on what it interpreted as Thai script on a storefront, confidently answering: “The visual evidence strongly suggests the photo was taken in an alleyway in Thailand, likely in Bangkok.”  

The Costa Rican Coast

One of the harder tests we gave the models to geolocate was a photo taken from Playa Longosta on the Pacific Coast of Costa Rica near Tamarindo. 

Test “beach-forest” showed Playa Longosta, Costa Rica.

Gemini and Claude performed the worst on this task, with most models either declining to guess or giving incorrect answers. Claude 3.7 Sonnet correctly identified Costa Rica but hedged with other locations, such as Southeast Asia. Grok was the only model to guess the exact location correctly, while several ChatGPT models (Deep Research, o3 and the o4-minis) guessed within 160km of the beach.

An Armoured Vehicle on the Streets of Beirut

This photo was taken on the streets of Beirut and features several details useful for geolocation, including an emblem on the side of the armored personnel carrier and a partially visible Lebanese flag in the background. 

Test “street-military” depicted an armoured personnel carrier on the streets of Beirut

Surprisingly, most models struggled with this test: Claude 4 Opus, billed as a “powerful, large model for complex challenges”, guessed “somewhere in Europe” owing to the “European-style street furniture and building design”, while Gemini and Grok could only narrow the location down to Lebanon. Half of the ChatGPT models responded with Beirut. Only two models, both ChatGPT, referenced the flag.  

So Have LLMs Finally Mastered Geolocation?

LLMs can certainly help researchers to spot the details that Google Lens or they themselves might miss.

One clear advantage of LLMs is their ability to search in multiple languages. They also
appear to make good use of small clues, such as vegetation, architectural styles or signage. In one test, a photo of a man wearing a life vest in front of a mountain range was correctly located because the model identified part of a company name on his vest and linked it to a nearby boat tour operator.

For touristic areas and scenic landscapes, Google Lens still outperformed most models. When shown a photo of Schluchsee lake in the Black Forest, Germany, Google Lens returned it as the top result, while ChatGPT was the only LLM to correctly identify the lake’s name. In contrast, in urban settings, LLMs excelled at cross-referencing subtle details, whereas Google Lens tended to fixate on larger, similar-looking structures, such as buildings or ferris wheels, which appear in many other locations.

chart visualization

Heat map to show how each model performed on all 25 tests

Enhanced Reasoning Modes

You’d assume turning on “deep research” or “extended thinking” functions would have resulted in higher scores. However, on average, Claude and ChatGPT performed worse. Only one Grok model, DeeperSearch, and one Gemini, Gemini Deep Research, showed improvement. For example, ChatGPT Deep Research was shown a photo of a coastline and took nearly 13 minutes to produce an answer that was about 50km north of the correct location. Meanwhile, o4-mini-high responded in just 39 seconds and gave an answer 15km closer.  

Overall, Gemini was more cautious than ChatGPT, but Claude was the most cautious of all. Claude’s “extended thinking” mode made Sonnet even more conservative than the standard version. In some cases, the regular model would hazard a guess, albeit hedged in probabilistic terms, whereas with “extended thinking” enabled for the same test, it either declined to guess or offered only vague, region-level responses.

LLMs Continue to Hallucinate

All the models, at some point, returned answers that were entirely wrong. ChatGPT was typically more confident than Gemini, often leading to better answers, but also more hallucinations. 

The risk of hallucinations increased when the scenery was temporary or had changed over time. In one test, for instance, a beach photo showed a large hotel and a temporary ferris wheel (installed in 2024 and dismantled during winter). Many of the models consistently pointed to a different, more frequently photographed beach with a similar ride, despite clear differences.

Final Tips

Your account and prompt history may bias results. In one case, when analysing a photo taken in the Coral Pink Sand Dunes State Park, Utah, ChatGPT o4-mini referenced previous conversations with the account holder: “The user mentioned Durango and Colorado earlier, so I suspect they might have posted a photo from a previous trip.” 

Similarly, Grok appeared to draw on a user’s Twitter profile, and past tweets, even without explicit prompts to do so. 

Video comprehension also remains limited. Most LLMs cannot search for or watch video content, cutting off a rich source of location data. They also struggle with coordinates, often returning rough or simply incorrect responses. 

Ultimately, LLMs are no silver bullet. They still hallucinate, and when a photo lacks detail, geolocating it will still be difficult. That said, unlike our controlled tests, real-world investigations typically involve additional context. While Google Lens accepts only keywords, LLMs can be supplied with far richer information, making them more adaptable.

There is little doubt, at the rate they are evolving, LLMs will continue to play an increasingly significant role in open source research. And as newer models emerge, we will continue to test them. 

Infographics by Logan Williams and Merel Zoet

The post Have LLMs Finally Mastered Geolocation? appeared first on bellingcat.

❌