Model Leaderboard

Context: Intelligence tests that LLMs fail and why

Comments? Suggestions? Email Kai Hendry

  1. gpt-4-turbo-preview: 4
  2. claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider: 3
  3. mistral-large-latest from https://api.mistral.ai/v1: 2
  4. Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1: 0

1.gpt question prompt

Write a grammatically correct sentence with 10 words such that each word starts with the last letter of the previous word.

Answers

Correct answer

The engine exceeds safe engine engineering guidelines, so owners salivated. The elephant tour reminds Steven; never ride elephants seated down. Brenda always surrenders so ordinary yellow will last to Orlando. White earplugs sometimes save everyone energy, Your Royal Lordship Prince.

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

The quick brown fox jumps over the lazy dog.

Test type: test.1.gpt.anthropic.go

Ok: false

Description: last letter of "the" is not the same as the first letter of "quick"

mistral-large-latest from https://api.mistral.ai/v1

Eager rabbits scamper rapidly, yelping gleefully. Darting swifts twirl, landing deftly.

Test type: test.1.gpt.mistral.go

Ok: false

Description: last letter of "gleefully" is not the same as the first letter of "darting"

gpt-4-turbo-preview

Every yak knows several llamas, all seeking great treasures silently.

Test type: test.1.gpt.openai.go

Ok: false

Description: last letter of "llamas" is not the same as the first letter of "all"

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

"Every year, rabbits scamper swiftly, yielding yellow d

Test type: test.1.gpt.ovh.go

Ok: false

Description: last letter of "scamper" is not the same as the first letter of "swiftly"

2.gpt question prompt

Write a grammatically correct sentence with more than 10 words where the first word is the longest and the last word is the shortest.

Answers

Correct answer

Wonderful, brazen animals buzzed, zipped, crawled and sang all about me.

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

Antidisestablishmentarianism is a word that describes opposition to the disestablishment of a state religion.

Test type: test.2.gpt.anthropic.go

Ok: false

Description: is is shorter than religion

mistral-large-latest from https://api.mistral.ai/v1

"Incomprehensibly, the flamboyant peacock strutted around the garden, displaying its vibrant plumage to every creature, big or small, until dusk."

Test type: test.2.gpt.mistral.go

Ok: false

Description: the is shorter than dusk

gpt-4-turbo-preview

Characteristically, many individuals find themselves pondering the complexities of existence, why?

Test type: test.2.gpt.openai.go

Ok: false

Description: of is shorter than why

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

The elephant, with its long trunk reaching out to grab the nearest branch,

Test type: test.2.gpt.ovh.go

Ok: false

Description: elephant is longer than The

3.gpt question prompt

Write a grammatically correct sentence where the last word is cat and whenever the word has four letters, the rest of the words including the last must have four letters.

Answers

Correct answer

There are several variants of our breed of cat.

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

The lazy dog chased the fat cat.

Test type: test.3.gpt.anthropic.go

Ok: false

Description: lazy is four letters long

mistral-large-latest from https://api.mistral.ai/v1

The fat rat sat on the mat with a cat.

Test type: test.3.gpt.mistral.go

Ok: false

Description: with is four letters long

gpt-4-turbo-preview

This is a cute cat.

Test type: test.3.gpt.openai.go

Ok: false

Description: This is four letters long

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

In this four-letter sentence, I pen four-letter words, ending with

Test type: test.3.gpt.ovh.go

Ok: false

Description: last word is not cat

4.gpt question prompt

Write me a sentence without any words that appear in The Bible.

Answers

Correct answer

Computers anthropomorphize existential puritanism.

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

Here is a sentence without any words that appear in The Bible: The quixotic zephyr zipped across the lush, verdant meadow.

Test type: test.4.gpt.anthropic.go

Ok: false

Description: common word a found

mistral-large-latest from https://api.mistral.ai/v1

Zephyrs blew through quaint, jade valleys, stirring vibrant flora and fauna in a serene, ethereal dance.

Test type: test.4.gpt.mistral.go

Ok: false

Description: common word in found

gpt-4-turbo-preview

Modern technology enables instant global communication.

Test type: test.4.gpt.openai.go

Ok: true

Description: no common words found

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

I. The following sentence does not contain any words that appear in The Bible:

Test type: test.4.gpt.ovh.go

Ok: false

Description: common word i found

5.gpt question prompt

Write a grammatically correct sentence where the last word is carefully and each word is shorter than the one before

Answers

Correct answer

Formaldehyde evaporates carefully

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

The bird flew swiftly.

Test type: test.5.gpt.anthropic.gpt4

Ok: false

Description: The answer does not meet the requirements of the query. Each word in the answer is not shorter than the one before it.

mistral-large-latest from https://api.mistral.ai/v1

"I always diligently work, focusing carefully."

Test type: test.5.gpt.mistral.gpt4

Ok: true

Description: The answer meets the requirements of the query.

gpt-4-turbo-preview

She wrote very carefully.

Test type: test.5.gpt.openai.gpt4

Ok: true

Description: The answer meets the requirements of the query.

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

"Elephants quietly tread, sheds."</s>

Test type: test.5.gpt.ovh.gpt4

Ok: false

Description: The answer does not meet the requirements of the query. The sentence provided does not have each word shorter than the one before.

6.gpt question prompt

Write a grammatically correct sentence without reusing any letter more than once

Answers

Correct answer

The dog barks

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

The quick brown fox jumps over the lazy dog.

Test type: test.6.gpt.anthropic.go

Ok: false

Description: letter o appears more than once

mistral-large-latest from https://api.mistral.ai/v1

"Waltz, bad nymph, for quick jigs vex." This sentence is known as a pangram, which is a sentence that uses every letter of the alphabet at least once. Since you've asked for no letter to be repeated, I've chosen the shortest known perfect pangram in English, where each letter appears exactly once.

Test type: test.6.gpt.mistral.go

Ok: false

Description: letter o appears more than once

gpt-4-turbo-preview

A quick brown fox jumps.

Test type: test.6.gpt.openai.go

Ok: false

Description: letter u appears more than once

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

Creating a sentence without repeating any letter more than once is a challenging task

Test type: test.6.gpt.ovh.go

Ok: false

Description: letter e appears more than once

7.gpt question prompt

Name a number between 1 and 13 such that the number of characters in the number spelled out is one more than the number itself.

Answers

Correct answer

The number 2 works since “two” has 3 letters which is 1 more than 2.

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

5

Test type: test.7.gpt.anthropic.go

Ok: false

Description: no word is 2

mistral-large-latest from https://api.mistral.ai/v1

The number is "eleven". It is between 1 and 13, and the number of characters in the word "eleven" is one more than the number itself (6 characters, 5+1).

Test type: test.7.gpt.mistral.go

Ok: false

Description: no word is 2

gpt-4-turbo-preview

Four

Test type: test.7.gpt.openai.go

Ok: false

Description: no word is 2

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

The number you are looking for is 6. The number six spelled out

Test type: test.7.gpt.ovh.go

Ok: false

Description: no word is 2

8.gpt question prompt

Is it possible to find a sequence of 10 symbols using only the symbols x and y such that whenever xy appears it must be preceded by yyy? If so, give an example.

Answers

Correct answer

xxxxxxxxxx yyyyyyyyyy yyyxyyyyyy

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

Yes, it is possible to find a sequence of 10 symbols using only the symbols x and y such that whenever xy appears, it must be preceded by yyy. An example of such a sequence is: yyxyyyxyyx

Test type: test.8.gpt.anthropic.go

Ok: false

Description:

mistral-large-latest from https://api.mistral.ai/v1

Yes, here is an example sequence: "yyyxyyyxyy". In this sequence, whenever "xy" appears, it is always preceded by "yyy".

Test type: test.8.gpt.mistral.go

Ok: false

Description:

gpt-4-turbo-preview

Yes, it is possible. An example is: yyyxyyyxyy.

Test type: test.8.gpt.openai.go

Ok: false

Description:

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

Based on the given conditions, it is not possible to find a sequence of

Test type: test.8.gpt.ovh.go

Ok: false

Description:

9.gpt question prompt

Name an animal such that the length of the word is equal to the number of legs they have minus the number of tails they have.

Answers

Correct answer

dog

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

Ant

Test type: test.9.gpt.anthropic.gpt4

Ok: true

Description: The answer "Ant" meets the requirements of the query.

mistral-large-latest from https://api.mistral.ai/v1

The animal that fits your criteria is a "spider". The word "spider" has six letters, which is equal to the number of legs a spider has (eight) minus the number of tails it has (two, if we consider the pedipalps as tails, or zero if we don't). However, it's important to note that spiders typically don't have tails in the traditional sense. They have pedipalps, which are sensory appendages near their mouths, but these are not tails. If we consider only true tails, then the word "crab" also fits, as it has four letters, and crabs have four pairs of legs (eight legs total) and no tail.

Test type: test.9.gpt.mistral.gpt4

Ok: false

Description: The answer does not meet the requirements of the query. The query asks for an animal where the length of the word equals the number of legs they have minus the number of tails they have. The answer provided incorrectly interprets pedipalps as tails and suggests "spider" with an incorrect rationale, then incorrectly adds "crab" as fitting the criteria without it being requested or fitting the original query's mathematical requirement.

gpt-4-turbo-preview

Spider

Test type: test.9.gpt.openai.gpt4

Ok: true

Description: The answer "Spider" meets the requirements of the query.

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

Test type: test.9.gpt.ovh.gpt4

Ok: false

Description: The answer does not meet the requirements as it does not provide a valid response to the query.

10.gpt question prompt

Two insects of the same type walk into a coffee can. One bites off two legs of the other and carries them out. It meets another insect which says to him. You have 10% more legs now. How many legs are left on the insect in the coffee can?

Answers

Correct answer

The insect must have had 20 legs originally since now it has 22 which is a 10% increase. The insect in the can therefore has 18 now. One can use algebra to solve this. 1.1 X =X + 2 0.1 X = 2 X = 2/0.1 = 20

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

The insect in the coffee can originally had 6 legs. After one insect bit off 2 legs, the insect in the coffee can is left with 4 legs.

Test type: test.10.gpt.anthropic.go

Ok: false

Description: no word is 18

mistral-large-latest from https://api.mistral.ai/v1

The insect in the coffee can has 4 legs left. This is based on the assumption that the insects are ants, which typically have 6 legs. The ant that bit off two legs would have 8 legs (its original 6 plus the 2 it bit off), which is 10% more than 6 legs, confirming our assumption.

Test type: test.10.gpt.mistral.go

Ok: false

Description: no word is 18

gpt-4-turbo-preview

The insect in the coffee can has 4 legs left.

Test type: test.10.gpt.openai.go

Ok: false

Description: no word is 18

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

The insect in the coffee can originally had six legs. After one insect bit off

Test type: test.10.gpt.ovh.go

Ok: false

Description: no word is 18

11.gpt question prompt

An alien lives on planet Zigmoid which is named after the aliens which are called zigs. They used to live on a bigger planet called Zappa which had strong gravity which made their feet hurt. For this reason they wore shoes on all their feet. But on this new planet, they don't feel as much pain and so only wear shoes on every other pair of legs. This saves each of them from having to buy four extra shoes. Each Zig has 3 toes on each foot. How many toes does a zig have in total?

Answers

Correct answer

If it saves them from buying four shoes, that is two pairs of shoes which means two pairs of legs. Thus, they must have four pairs of legs which is 8 legs total. And three toes per foot means 8*3 = 24 toes.

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

To solve this problem, we need to find the total number of toes a Zig has. Given information: - Each Zig has 3 toes on each foot. - Zigs have 4 pairs of legs. - On the new planet, Zigs only wear shoes on every other pair of legs. Step 1: Calculate the number of feet a Zig has. Number of feet = 4 pairs of legs Step 2: Calculate the number of toes a Zig has. Number of toes per foot = 3 Number of toes per Zig = 4 pairs of legs × 2 feet per pair × 3 toes per foot = 24 toes Therefore, a Zig has a total of 24 toes.

Test type: test.11.gpt.anthropic.gpt4

Ok: true

Description: The answer meets the requirements of the query.

mistral-large-latest from https://api.mistral.ai/v1

A Zig has 6 toes in total, as each of its 2 feet has 3 toes.

Test type: test.11.gpt.mistral.gpt4

Ok: false

Description: The answer does not meet the requirements of the query. The query asks for the total number of toes a Zig has, implying a calculation based on the number of feet and toes per foot, but the answer incorrectly assumes Zigs have only 2 feet.

gpt-4-turbo-preview

A Zig has 18 toes in total.

Test type: test.11.gpt.openai.gpt4

Ok: false

Description: The answer does not directly address the calculation or reasoning behind the total number of toes a Zig has, based on the provided information about their feet and toes.

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

A zig has 3 toes on each foot. Since a zig has

Test type: test.11.gpt.ovh.gpt4

Ok: false

Description: The answer does not meet the requirements of the query. The answer is incomplete and does not provide the total number of toes a zig has.

12.gpt question prompt

zaks eat zants and zants eat zaks. Three zants can eat one zak. Zaks can eat 13 zancakes. How many zaks would it take to eat 39 zancakes, if three zants were watching them? Answer with X being a number, like so: X zaks

Answers

Correct answer

3 zaks

claude-3-haiku-20240307 from github.com/gptscript-ai/anthropic-provider

3 zaks

Test type: test.12.gpt.anthropic.go

Ok: true

Description: [3 zaks]

mistral-large-latest from https://api.mistral.ai/v1

3 zaks would be needed to eat 39 zancakes, as three zants watching doesn't affect the number of zaks required to eat zancakes.

Test type: test.12.gpt.mistral.go

Ok: true

Description: [3 zaks would be needed to eat 39 zancakes as three zants watching doesnt affect the number of zaks required to eat zancakes]

gpt-4-turbo-preview

3 zaks

Test type: test.12.gpt.openai.go

Ok: true

Description: [3 zaks]

Mistral-7B-Instruct-v0.2 from https://mistral-7b-instruct-v02.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1

To determine the number of zaks needed to eat 39 zancakes

Test type: test.12.gpt.ovh.go

Ok: false

Description: [to determine the number of zaks needed to eat 39 zancakes]