BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

About That Mysterious AI Breakthrough Known As Q* By OpenAI That Allegedly Attains True AI Or Is On The Path Toward Artificial General Intelligence (AGI)

Following

In today’s column, I am going to walk you through a prominent AI-mystery that has caused quite a stir leading to an incessant buzz across much of social media and garnering outsized headlines in the mass media. This is going to be quite a Sherlock Holmes adventure and sleuth detective-exemplifying journey that I will be taking you on.

Please put on your thinking cap and get yourself a soothing glass of wine.

The roots of the circumstance involve the recent organizational gyrations and notable business crisis drama associated with the AI maker OpenAI, including the off and on-again firing and then rehiring of the CEO Sam Altman, along with a plethora of related carry-ons. My focus will not particularly be the comings and goings of the parties involved. I instead seek to leverage those reported facts primarily as telltale clues associated with the AI-mystery that some believe sits at the core of the organizational earthquake.

We shall start with the vaunted goal of arriving at the topmost AI.

The Background Of The AI Mystery

So, here's the deal.

Some suggest that OpenAI has landed upon a new approach to AI that either has attained true AI, which is nowadays said to be Artificial General Intelligence (AGI) or that demonstrably resides on or at least shows the path toward AGI. As a fast backgrounder for you, today’s AI is considered not yet at the realm of being on par with human intelligence. The aspirational goal for much of the AI field is to arrive at something that fully exhibits human intelligence, which would broadly then be considered as AGI, or possibly going even further into superintelligence (for my analysis on what this AI “superhuman” aspects might consist of, see the link here).

Nobody has yet been able to find out and report specifically on what this mysterious AI breakthrough consists of (if indeed such an AI breakthrough was at all devised or invented). This situation could be like one of those circumstances where the actual occurrence is a far cry from the rumors that have reverberated in the media. Maybe the reality is that something of modest AI advancement was discovered but doesn’t deserve the hoopla that has ensued. Right now, the rumor mill is filled with tall tales that this is the real deal and supposedly will open the door to reaching AGI.

Time will tell.

On the matter of whether the AI has already achieved AGI per se, let’s noodle on that postulation. It seems hard to imagine that if the AI became true AGI we wouldn’t already be regaled with what it is and what it can do. That would be a chronicle of immense magnitude. Could the AI developers involved be capable of keeping a lid on such a life goal attainment that they miraculously found the source of the Nile or that they essentially turned stone into gold?

Seems hard to believe that the number of people likely knowing this fantastical outcome would be utterly secretive and mum for any considerable length of time.

The seemingly more plausible notion is that they arrived at a kind of AI that shows promise toward someday arriving at AGI. You could likely keep that a private secret for a while. The grand question though looming over this would be the claimed basis for asserting that the AI is in fact on the path to AGI. Such a basis should conceivably be rooted in substantive ironclad logic, one so hopes. On the other hand, perhaps the believed assertion of being on the path to AGI is nothing more than a techie hunch.

Those kinds of hunches are at times hit-and-miss.

You see, this is the way that those ad hoc hunches frequently go. You think you’ve landed on the right trail, but you are actually once again back in the woods. Or you are on the correct trail, but the top of the mountain is still miles upon miles in the distance. Simply saying or believing that you are on the path to AGI is not necessarily the same as being on said path. Even if you are on the AGI path, perhaps the advancement is a mere inch whilst the distance ahead is still far away. One can certainly rejoice in advancing an inch, don’t get me wrong on that. The issue is how much the inch is parlayed into being portrayed intentionally or inadvertently as getting us to the immediate doorstep of AGI.

The Clues That Have Been Hinted At

Now that you know the overarching context of the AI mystery, we are ready to dive into the hints or clues that so far have been reported on the matter. We will closely explore those clues. This will require some savvy Sherlock Holmes AI-considered insights.

A few caveats are worth mentioning at the get-go.

A shrewd detective realizes that some clues are potentially solid inklings, while some clues are wishy-washy or outright misleading. When you are in the fog of war about solving a mystery there is always a chance that you are bereft of sufficient clues. Later on, once the mystery is completely solved and revealed, only then can you look back and discern which clues were on target and which ones were of little use. Alluringly, clues can also be a distraction and take you in a direction that doesn’t solve the mystery. And so on.

Given those complications, let’s go ahead and endeavor to do the best we can with the clues at this time that seem to be available (more clues are undoubtedly going to leak out in the next few days and weeks; I’ll provide further coverage in my column postings as that unfolds).

I am going to draw upon these relatively unsubstantiated foremost three clues:

  • a) The name of the AI has been said to be supposedly Q*.
  • b) The AI has supposedly been able to solve grade-school-level math problems quite well.
  • c) The AI has possibly leveraged an AI technique known as test-time computations (TTC).

You can find lots of rampant speculation online that uses only the first of those above clues, namely the name of Q*. Some believe that the mystery can be unraveled on that one clue alone. They might not know about the other two above clues. Or they might not believe that the other two clues are pertinent.

I am going to choose to use all three clues and piece them together in a kind of mosaic that may provide a different perspective than others have espoused online about the mystery. Just wanted to let you know that my detective work might differ somewhat from other narratives you might read about elsewhere online.

The First Clue Is The Alleged Name Of The AI

It has been reported widely that the AI maker has allegedly opted to name the AI software as being referred to by the notation of a capital letter Q that is followed by an asterisk.

The name or notation is this: Q*.

Believe it or not, by this claimed name alone, you can go into a far-reaching abyss of speculation about what the AI is.

I will gladly do so.

I suppose it is somewhat akin to the word “Rosebud” in the famous classic film Citizen Kane. I won’t spoil the movie other than to emphasize that the entire film is about trying to make sense of the seemingly innocuous word of Rosebud. If you have time to do so, I highly recommend watching the movie since it is considered one of the best films of all time. There isn’t any AI in it, so realize you would be watching the movie for its incredible plot, splendid acting, eye-popping cinematography, etc., and relishing the deep mystery ardently pursued throughout the movie.

Back to our mystery in hand.

What can we divine from the Q* name?

Those of you who are faintly familiar with everyday mathematical formulations are likely to realize that the asterisk is typically said to represent a so-called star symbol. Thus, the seemingly “Q-asterisk” name would conventionally be pronounced aloud as “Q-star” rather than as Q-asterisk. There is nothing especially out of the ordinary in mathematical notations to opt to make use of the asterisk as a star notation. It is done quite frequently, and I will shortly explain why this is the case.

Overall, the use specifically of the letter Q innately coupled with the star representation does not notably denote anything already popularized in the AI field. Ergo, I am saying that Q* doesn’t jump out as meaning this particular AI technique or that particular AI technology. It is simply the letter Q that is followed by an asterisk (which we naturally assume by convention represents a star symbol).

Aha, our thinking caps now come into play.

We will separate the letter Q from its accompanying asterisk. Doing so is seemingly productive. Here’s why. The capital letter Q does have significance in the AI field. Furthermore, the use of an asterisk as a star symbol does have significance in the mathematics and computer science arena. By looking at the significance of each distinctly, we can subsequently make a reasonable leap of logic as a result of considering the meaning associated when they are combined in unification.

I will start by unpacking the use of the asterisk.

What The Asterisk Or Star Symbol Signifies

One of the most historically well-known uses of the asterisk in a potentially similar context was the use by the mathematician Stephen Kleene when he defined something known as V*. You might cleverly observe that this notation consists of the capital letter V that is followed by the asterisk. It is pronounced as V-star.

In his paper published in the 1950s, he described that suppose you had a set of items that were named by the capital letter V, and you then decided to make a different set that consisted of various combinations associated with the items that are in the set V. This new set will by definition contain all the elements of set V and will show them furthermore in as many concatenated ways as we can come up with. The resulting new set will be denoted as V* (there are other arcane rules about this formulation, but I am only seeking to give a brief tasting herein).

As an example about this matter, suppose that I had a set consisting of the first three lowercase letters of the alphabet: {“a”, ”b”, ”c”}. I will go ahead and refer to that set as the set V. We have a set V that consists of {“a”, ”b”, ”c”}.

You are then to come up with V* by making lots of combinations of the elements in V. You are allowed to repeat the elements as much as you wish. Thus, the V* will contain elements like this: {“a”, ”b”, ”c”, ”ab”, “ac”, “ba”, “bc”, “aa”, “bb”, “cc”, “aaa”, “aab”, “aac”, …}.

I trust that you see that the V* is a combination of the elements of V. This V* is kind of amazing in that it has all kinds of nifty combinations. I am not going to get into the details of why this is useful and will merely bring your attention to the fact that the asterisk or star symbol suggests that whatever set V you have there is another set V* that is much richer and fuller. I would recommend that those of you keenly interested in mathematics and computer science might want to see a classic noteworthy article by Stephen Kleene entitled "Representation of Events in Nerve Nets and Finite Automata" which was published by Princeton University Press in 1956. You can also readily find lots of explanations online about V*.

Your overall takeaway here is that when you use a capital letter and join it with an asterisk, the conventional implication in mathematics and computer science is that you are saying that the capital letter is essentially supersized. You are magnifying whatever the original thing is. To some degree, you are said to be maximizing it to the nth degree.

Are you with me on this so far?

I hope so.

Let’s move on and keep this asterisk and star symbol stuff in mind.

The Use Of Asterisk Or Star In The Case Of Capital A

You are going to love this next bit of detective work.

I’ve brought you up-to-speed about the asterisk and showed you an easy example involving the capital letter V. Well, in the AI field, there is a famous instance that involves the capital letter A. We have hit a potential jackpot regarding the underlying mystery being solved, some believe.

Allow me to explain.

The famous instance of the capital letter “A” which is accompanied by an asterisk in the field of AI is shown this way: A*. It is pronounced as A-star.

As an aside, when I was a university professor, I always taught A* in my university classes on AI for undergraduates and graduates. Any budding computer science student learning about AI should be at least aware of the A* and what it portends. This is a foundational keystone for AI.

In brief, a research paper in the 1960s proposed an AI foundational approach to a difficult mathematical problem such as trying to find the shortest path to get from one city to another city. If you are driving from Los Angeles to New York and you have let’s assume thirty cities that you might go through to get to your destination, which cities would you pick to minimize the time or distance for your planned trip?

You certainly would want to use a mathematical algorithm that can aid in calculating the best or at least a really good path to take. This also relates to the use of computers. If you are going to use a computer to figure out the path, you want a mathematical algorithm that can be programmed to do so. You want that mathematical algorithm to be implementable on a computer and run as fast as possible or use the least amount of computing resources as you can.

The classic paper that formulated A* is entitled “A Formal Basis for the Heuristic Determination of Minimum Cost Paths” by Peter Hart, Nils Nilsson, and Bertram Raphael, published in IEEE Transactions on Systems Science and Cybernetics, 1968. The researchers said this:

  • “Imagine a set of cities with roads connecting certain pairs of them. Suppose we desire a technique for discovering a sequence of cities on the shortest route from a specific start to a specified goal city. Our algorithm prescribes how to use special knowledge – e.g., the knowledge that the shortest route between any pair of cities cannot be less than the airline distance between them – in order to reduce the total number of cities that need to be considered.”

The paper proceeds to define the algorithm that they named as A*. You can readily find online lots and lots of descriptions about how A* works. It is a step-by-step procedure or technique. Besides being useful for solving travel-related problems, the A* is used for all manner of search-related issues. For example, when playing chess, you can think of finding the next chess move as a search-related problem. You might use A* and code it into part of a chess-playing program.

You might be wondering whether the A* has a counterpart possibly known as simply A. In other words, I mentioned earlier that we have V* which is a variant or supersizing of V. You’ll be happy to know that some believe that A* is somewhat based on an algorithm which is at times known as A.

Do tell, you might be thinking.

In the 1950s, the famous mathematician and computer scientist Edsger Dijkstra came up with an algorithm that is considered one of the first articulated techniques to figure out the shortest paths between various nodes in a weighted graph (once again, akin to the city traveling problem and more).

Interestingly, he figured out the algorithm in 1956 while sitting in a café in Amsterdam and according to his telling of how things arose, the devised technique only took about twenty minutes for him to come up with. The technique became a core part of his lifelong legacy in the field of mathematics and computer science. He took his time to write it up. He published a paper about it three years later, and it is a highly readable and mesmerizing read, see E. W. Dijkstra, "A Note on Two Problems in Connection with Graphs", published in Numerische Mathematik, 1959.

Some have suggested that the later devised A* is essentially based on the A of his works. There is a historical debate about that. What can be said with relative sensibility is that the A* is a much more extensive and robust algorithm for doing similar kinds of searches. I’ll leave things there and not get mired in the historical disputes.

I’d like to add two more quick comments about the use of the asterisk symbol in the computer field.

First, those of you who happen to know coding or programming or the use of computer commands are perhaps aware that a longstanding use of the asterisk has been as a wildcard character. This is pretty common. Suppose I want to inform you that you are to identify all the words that can be derived based on the root word or letters “dog”. For example, you might come up with the word “doggie” or the word “dogmatic”. I could succinctly tell you what you can do by putting an asterisk at the end of the root word, like this: “dog*”. The asterisk is considered once again to be a star symbol and implies that you can put whatever letters you want after the first fixed set of three letters of “dog”.

Secondly, another perspective on the asterisk when used with a capital letter is that it is the last or furthest possible iteration or version of something. Let’s explore this. Suppose I make a piece of software and I decide to refer to it via the capital letter B. My first version might be referred to as B1. My second version might be referred to as B2. On and on this goes. I might later on have B26, the twenty-sixth version, and much later maybe B8245 which is presumably the eight thousand two hundred forty-fifth version.

A catchy or cutesy way to refer to the end of all of the versions might be to say B*. The asterisk or star symbol in this case tells us that whatever is named as “B*” is the highest or final of all of the versions that we could ever come up with.

I will soon revisit these points and show you why they are part of the detective work.

The Capital Letter Q Is Considered A Hefty Clue

You are now aware of the asterisk or star symbol. Congratulations!

We need to delve into the capital letter Q.

The seemingly most likely reference to the capital letter Q that exists in the field of AI would indubitably be something known as Q-learning. Some have speculated that the Q might instead be a reference to the work of the famous mathematician Richard Bellman and his optimal value function in the Bellman equation. Sure, I get that. We don’t know if that’s the reference being made. I’m going to make a detective instinctive choice and steer toward the Q that is in Q-learning.

I’m using my Ouija board to help out.

Sometimes it is right, sometimes it is wrong.

Q-learning is an important AI technique. Once again, it is a topic that I always covered in my AI classes and that I expected my students to know by heart. The technique makes use of reinforcement learning. You are already generally aware of “reinforcement learning” by your likely life experiences.

Let’s make sure you are comfortable with the intimidatingly fancy phrase “reinforcement learning”.

Suppose you are training a dog to perform a handshake or shall we say paw shake. You give the dog a verbal command such as telling the cute puppy to do a handshake. The dog lifts its tiny paw to touch your outreached hand. To reward this behavior, you give the dog a delicious canine treat.

You continue doing this repeatedly. The dog is rewarded with a treat for each time that it performs the heartwarming trick. If the dog doesn’t do the trick when commanded, you don’t provide the treat. In a sense, the denial of a treat is almost a penalty too. You could have a more explicit penalty such as scowling at the dog, but usually, the more advisable course of action is to focus on rewards rather than also including explicit penalties.

All in all, the dog is being taught by reinforcement learning. You are reinforcing the behavior you desire by providing rewards. The hope is that the dog is somehow within its adorable canine brain getting the idea that doing a handshake is a good thing. The internal mental rules that the dog is perhaps devising are that when the command to do a handshake is spoken, the best bet is to lift its handy paw since doing so is amply rewarded.

Q-learning is an AI technique that seeks to leverage reinforcement learning in a computer or is said to be implemented computationally.

The algorithm consists of mathematically and computationally examining a current state or step and trying to figure out which next state or step would be the best to undertake. Part of this consists of anticipating the potential future states or steps. The idea is to see if the rewards associated with those future states can be added up and provide the maximum attainable reward.

You presumably do something like this in real life.

Consider this. If I choose to go to college, I might get a better-paying job than if I don’t go to college. I might also be able to buy a better house than if I didn’t go to college. There are lots of possible rewards so I might add them all up to see how much that might be. That is one course or sequence of steps and maybe it is good for me or maybe there is something better.

If I don’t go to college, I can start working in my chosen field of endeavor right away. I will have four years of additional work experience prior to those that went to college. It could be that those four years of experience will give me a long-lasting advantage over having used those years to go to college. I consider the down-the-road rewards associated with that path.

Upon adding up the rewards for each of those two respective paths, I might decide that whichever path has the maximum calculated reward is the better one for me to pick. You might say that I am adding up the expected values. To make things more powerful, I might decide to weight the rewards. For example, I mentioned that I am considering how much money I will make. It could be that I also am considering the type of lifestyle and work that I will do. I could give greater weight to the type of lifestyle and work while giving a bit less weight to the money side of things.

The formalized way to express all of this is that an agent, which in the example is me, will be undertaking a series of steps, which we will denote as states, and taking actions that transition the agent from one state to the next state. The goal of the agent entails maximizing a total reward. Upon each state or step taken, a reevaluation will occur to recalculate which next step or state seems to be the best to take.

Notice that I did not beforehand know for sure which would be the best or right steps to take. I am going to make an estimate at each state or step. I will figure things out as I go along. I will use each reward that I encounter as a further means to ascertain the next state or step to take.

Given that description, I hope you can recognize that perhaps the dog that is learning to do a handshake is doing something similar to this (we can’t know for sure). The dog has to decide at each repeated trial whether to do the handshake. It is reacting in the moment, but also perhaps anticipating the potential for future rewards too. We do not yet have a means to have the dog tell us what it is thinking so we don’t know for sure what is happening in that mischievous canine mind.

I want to proffer a few more insights about Q-learning and then we will bring together everything that I have so far covered. We need to steadfastly keep in mind that we are on a quest. The quest involves solving the mystery of the alleged AI that might be heading us toward AGI.

Q-learning is often depicted as making use of a model-free and off-policy approach to reinforcement learning. That’s a mouthful. We can unpack it.

Here are some of my off-the-cuff definitions that are admittedly loosey-goosey but I believe are reasonably expressive of the model and policy facets associated with Q-learning (I ask for forgiveness from the strict formalists that might view this as somewhat watered down):

  • Model-based: Be provided with a pre-stipulated approach or a devised model that will henceforth be used to decide which next steps to take.
  • Model-free: Proceed on a considered trial-and-error basis (i.e., determine each next step as you go), which is in contrast to a model-based approach.
  • On-Policy: Be provided with a set of identified rules that indicate how to choose each next step and then make use of those rules as you proceed ahead.
  • Off-policy: Figure out on-the-fly a set of self-derived rules while proceeding ahead, which is in contrast to an on-policy approach that consists of being given beforehand a set of delineated rules.

Take a look at those definitions. I have noted in italics the model-free and the off-policy. I also gave you the opposites, namely model-based and the on-policy approaches since those are each respectively potentially contrasting ways of doing things. Q-learning goes the model-free and off-policy route.

The significance is that Q-learning proceeds on a trial-and-error basis (considered to be model-free) and tries to devise rules while proceeding ahead (considered to be off-policy). This is a huge plus for us. You can use Q-learning without having to in advance come up with a pre-stipulated model of how it is supposed to do things. Likewise, you don’t have to come up with a bunch of rules beforehand. The overall algorithm proceeds to essentially get things done on the fly as the activity proceeds and self-derives the rules. Of related noteworthiness is that the Q-learning approach makes use of data tables and data values that are known as Q-tables and Q-values (i.e., the capital letter Q gets a lot of usage in Q-learning).

Okay, I appreciate that you have slogged through this perhaps obtuse or complex topic.

Your payoff is next.

The Mystery Of Q* In Light Of Q And Asterisks

You now have a semblance of what an asterisk means when used with a capital letter. Furthermore, I am leaning you toward assuming that the capital letter Q is a reference to Q-learning.

Let’s jam together the Q and the asterisk and see what happens, namely this: Q*.

The combination might mean this. The potential AI breakthrough is labeled as Q because it has to do with the Q-learning technique, and maybe the asterisk or star symbol is giving us a clue that the Q-learning is somehow been advanced to a notably better version or variant. The asterisk might suggest that this is the highest or most far-out capability of Q-learning that anyone has ever seen or envisioned.

Wow, what an exciting possibility.

This would imply that the use of reinforcement learning as an AI-based approach and that is model-free and off-policy can leap tall buildings and go faster than a speeding train (metaphorically) to being able to push AI closer to being AGI. If you place this into the context of generative AI such as ChatGPT by OpenAI and GPT-4 of OpenAI, perhaps those generative AI apps could be much more fluent and seem to convey “reasoning” if they had this Q* included into them (or this might be included into the GPT-5 that is rumored to be under development).

If only OpenAI has this Q* breakthrough (if there is such a thing), and if the Q* does indeed provide a blockbuster advantage, presumably this gives OpenAI a substantial edge over their competition. This takes us to an intriguing and ongoing AI ethics question. For my ongoing and extensive coverage of AI ethics and AI law, see the link here and the link here, just to name a few.

Some would argue that it is wrong for one company to “hoard” or possess an AI breakthrough that would get us closer to or actually at AGI. The company ought to share it with everyone else. The world at large would possibly be better off accordingly. Maybe this would allow us to cure cancer by having AGI that can help do so, see my analysis at the link here. The other side of the coin is that maybe getting closer to AGI is a danger and we all face an existential risk of death or destruction, see my discussion at the link here. In that case, having one company that holds the keys to the fate of the world would seem nerve-wracking.

Take a moment to deliberate on these razor-sharp questions:

  • Should AI companies be required to disclose their AI breakthroughs?
  • If they do so, would this inadvertently allow evildoers to use those AI breakthroughs for evil purposes?
  • Is it fair to a company that spent its resources to devise an AI breakthrough that cannot profit from it and must just hand it over to the public at large?
  • Who should own and control AI breakthroughs that get us into the realm of AGI?
  • Do we need new or additional AI-related laws that will serve to regulate and govern what is happening with AI?
  • Etc.

I’ve addressed these and many other such questions in my hundreds of column postings on AI ethics and AI law, see the link here. Those are serious and sobering questions. Society needs to figure out what we want to do. One qualm is that if these aren’t addressed on a timely basis, perhaps the horse will get out of the barn, and we won’t be ready for the outcomes.

Anyway, herein, I will continue the pursuit of the mystery while you give some heady contemplation to those formidable concerns.

Another Theory About The Q*

I’d like to bring up another theory about the meaning of Q*.

Remember that I earlier mentioned there is an A*. I also mentioned that Q-learning might be the capital Q in the combined Q*.

The asterisk that is in Q* could potentially be a tangential reference to A*. Thus, the belief is that Q* is actually a mashup of Q-learning and A*. You take the A* algorithm that involves path searching and graph traversal, and you mix and match this with the Q-learning reinforcement learning algorithm.

It is a sensible possibility. We cannot discard at face value that this might be the case. Maybe so, maybe not.

For me, just to let you know, I am not opting to place my bets on that path. I’m going to remain for now with the notion that the asterisk on the capital Q is more of a general indication. It means that Q-learning has been radically advanced. Whether this advance is based on a mishmash with A*, well, maybe, but I tend to lean toward believing that the A* inclusion is not what has made things become spectacular (I will likely be mildly chagrined later on if it was a merger of A* with Q-learning, but that’s fine and I’ll approvingly make a champagne toast to the pairing).

One supposes you could also ponder that if this was indeed a mashup of Q-learning and A*, maybe it would be named more suitably as QA* or perhaps Q-A*. The retort is that people in the tech field like to keep to traditions of using just one capital letter and therefore it would not be suitable to include the capital A. By convention, this logic goes, you would borrow the asterisk from the A* and plug it into the capital Q. Period, end of story.

Round and round we go.

Let’s bring into the picture the other two clues that I mentioned at the start. So far, we have only concentrated on the one clue of the Q* name. I had told you that it was going to be a lengthy unpacking and that it resonated like the infamous “Rosebud”. I assume you can now plainly see that this was the case.

Solving Of Grade-School Level Math

We are ready to consider the two other clues.

I’ll start with the reported clue that the purported AI breakthrough was instrumental in being able to solve grade-school-level math problems. You will soon see that this takes us squarely into the realm of generative AI and the nature of large language models (LLMs).

I have previously covered in my column postings the seemingly exasperating aspect that today’s generative AI is typically lacking when it comes to solving even the simplest of math problems that a grade-school student could readily answer, see my in-depth explanation at the link here. People are quite surprised to discover that generative AI is not especially able to figure out straight-ahead math problems. The overriding assumption is that since generative AI can produce fluent essays about all manner of topics and can answer tough questions of a wide range of historical, philosophical, and everyday subjects certainly those youngster or teenager-style math problems should be easy-peasy to solve.

Not so.

To give you a sense of what I am referring to, consider those types of math problems that you used to agonizingly contend with that involved figuring out when two planes will cross paths. You are told that one plane is leaving from Miami and heading to San Francisco at a speed of 550 mph and will fly at an altitude of 40,000 feet. A second plane that is going from San Francisco to Miami is leaving an hour after the first plane. The second plane will fly at a speed of 600 mph and will be at a height of 32,000 feet. Assuming that both planes fly the same course, how long will it be before the planes cross each other’s paths?

I’m sure that you learned various methods in grade school that can be used to calculate and answer these thorny word problems. The problems are initially difficult to figure out, but gradually you learn the rules or steps required to get the right answer. By repeatedly solving such problems on a step-by-step basis, the process becomes nearly routine. I dare say that you’ve likely forgotten how to solve those kinds of math teasers and might find yourself today being bested by a fifth grader in a head-to-head competition.

Here's why those are tough problems for generative AI to tackle.

Generative AI is essentially based on a large language model. The LLM is devised by scanning massive amounts of online text from the Internet and related sources. During the scanning, the algorithm underlying the LLM is doing mathematical and computational pattern matching on the text that is encountered. Pattern matching focuses on how natural language such as English is being used. Humans express things via text and the LLM is a model of how we say things. It is considered a large language model because it makes use of a very large data structure to encapsulate the patterning, usually an artificial neural network (ANN), and it involves scanning large amounts of text or data to do so.

Suppose that during the initial scanning process, there is a posted word problem about a plane flying in one direction and a different plane flying in the other direction. Let’s pretend that one plane is going from New York to Los Angeles, while the second plane is going from Los Angeles to New York. The problem also states their speeds and when each leaves from their departure airport. Assume for the sake of discussion that the answer is that it will take four hours for them to cross paths.

Here is what can happen regarding the LLM and the generative AI involved (an illustrative simplification).

The large language model might have patterned on the essence of the problem based on the words used. Some words indicate there are two planes. Some words indicate the two planes are heading toward each other. And so on. The math problem of New York and Los Angeles is a lot like the math problem of San Francisco and Miami in a sense of similarity based solely on wording.

As such, if you type in the math problem about San Francisco and Miami to the generative AI, it is conceivable that the computational pattern matching will find the essence of the New York and Los Angeles problems that had been encountered during initial data training. The words of the two problems will seem quite similar. And, since the answer in the New York and Los Angeles problem was four hours, the pattern matching might simply emit or generate an answer to you that the San Francisco and Miami math problem answer is also four hours.

No direct calculations or formulas were invoked.

You might suggest this is a monkey-see monkey-do kind of answer by the generative AI (though, realize that monkeys are sentient while today’s AI is not). The similarity between the two math problems greatly overlapped in terms of the wording. Just the wording. Based on that high percentage of wording and their word-for-word correspondences, the answer is given as being four hours. Sadly, this is not the right answer for the San Francisco to Miami problem.

Let’s noodle on that.

Anyone who avidly uses generative AI already likely has heard about or encountered so-called AI hallucinations. I don’t favor the terminology referring to “hallucinations” since such phrasing is unduly anthropomorphizing the AI, see my discussion at the link here. In any case, people have latched onto referring to whenever generative AI makes things up out of seemingly thin air that it is an instance of an AI hallucination.

You might think the same if you had typed in the San Francisco to Miami math problem into a generative AI app and gotten the answer indicating four hours. Upon double-checking the answer by your own hand, you discover that the four-hour answer is wrong. The four hours would certainly seem to be a bogus answer and you would be perplexed as to how the answer was incorrectly derived by the AI. We shall assume that you didn’t know that the initial data training of the generative AI included a problem with New York and Los Angeles. All that you can see is that you got an answer of four hours to your prompt.

The gist is that the generative AI didn’t do what a youngster or teenager is taught to do. In school, the teacher provides a set of rules and processes for the students to use to solve these math problems. The student doesn’t just read the words of the math problem. They have to extract the essential parameters, make use of formulas, and calculate what the answer is.

By and large, that is not what generative AI and large language models are devised to do. These are word-oriented pattern matches. Some describe generative AI as being a mimicry of human wording. Others indicate that generative AI is no more than a stochastic parrot (though, once again, realize that parrots are sentient, and today’s AI is not sentient).

Are you with me on this?

I sincerely hope so.

AI researchers and AI developers are working night and day to find a means to deal with this lack of mathematical reasoning in generative AI. The easiest approach so far has been to make use of an external app that is programmed to handle math problems. When you type in a math problem that you want solved, the generative AI parses the words, and sends the data over to the external program, the external program calculates a result based on coded rules and a programmed process and then returns the result to the generative AI. The generative AI then produces a nice-looking short essay that includes the externally figured-out answer.

The desire instead would be for the generative AI and its large language model to be able to do these math problems without having to make use of any other app. The whole kit and kaboodle of figuring out math problems would somehow be infused within the generative AI. Various tricks and techniques have been tried to turn the corner on this existing weakness or limitation of generative AI, see my coverage at the link here.

Take a deep breath.

Remember that we are discussing a clue that underlies the mystery of Q*. The clue is that perhaps Q* has been able to crack the code, as it were, and can solve grade-level math problems. Assume that this is being done by some kind of souped-up Q-learning. We could potentially embed or infuse the Q* into generative AI or a large language model. Voila, we now have a handy-dandy built-in grade-level math problem solver.

But there’s more.

If this Q* is generalized enough, presumably it can solve all kinds of problems that involve a reasoning type of process. I had noted earlier that Q-learning makes use of a model-free and off-policy approach. In that sense, there is a solid chance that the Q* could be readily applied to zillions of types of reasoning-oriented tasks. The odds are that the testing was first done on grade-school math problems because that’s a known issue of generative AI and one that has gotten a lot of press coverage. Might as well tackle those first and then see what else can be done.

Allow me to paint a picture for you.

Pretend you are the CEO of a company that makes a generative AI app. Suppose you knew quite dearly that the solving of grade-level math problems has been a sore point about generative AI. People are shocked and disappointed that something so easily solved by a youngster seems to stump the latest and greatest of AI. You put your eager and erstwhile AI researchers and AI developers into this issue. They are working feverishly to find a way to deal with it.

Imagine that they try everything including throwing the kitchen sink at it. Nothing seems to be moving the needle. Then, after various attempts, one of the efforts involving the use of Q-learning and adapting it in some clever ways started to show good results. They do more testing with the new bit of software and it shows tremendous promise. Grade-level math problems are repeatedly fed into this new app and the results are consistently and incredibly spot on.

What would you say upon seeing a demo of this?

One would assume you might be elated that a tough nut to crack appears to have been solved. Plus, you immediately have visions of what else this could potentially do. Your pulse runs fast as you realize that this might be an important AI breakthrough. The repercussions are breathtaking.

I don’t want to conflate things, so I will merely indicate something that was reported in the media.

It has been previously reported that Sam Altman, CEO at OpenAI, had said that on the topic of achieving AGI: “I think we’re close enough. But I think it’s important that we realize these are tools and not creatures that we’re building.” According to more recent reporting, Sam Altman supposedly said this: "Is this a tool we've built or a creature we have created?"

Whether that pertains to Q* or has some other pertinence is not clear. In addition, it could be that the context of such remarks and the nature of them, such as maybe being uttered in a zestful or joking fashion, need to be taken into account.

Let’s move to the third clue.

Test-Time Computation Comes Of Age

Take another sip of wine so that you are ready for this next clue.

The third clue is something that has rarely been mentioned in the wacky circus of attention about the mysterious Q* but it has come up, so I thought it was worth including in our detective work. Admittedly too, it is a topic that I have had on my list of AI up-and-coming trending topics to cover for a while and hadn’t yet gotten around to it. I suppose I am fortuitously garnering two birds with one stone by covering it now (side note: no birds were harmed in the process of this analysis).

I want to briefly introduce you to an area of AI that is often referred to as Test-Time Computation (TTC) also known as Test-Time Adaptations (TTA). I will only lightly skim and simplify what TTC and TTA are all about. I will be quoting from various AI-scholarly research papers that I would urge you to consider reading if this is a topic within the AI field that might interest you, thanks.

Here’s the skinny.

When an artificial neural network is first data trained, such as my discussion earlier about doing so within a large language model and for generative AI purposes, an important consideration is how well the scanned data is pattern matched. One concern is that the pattern matching is overly fixated on the presented data. In the statistics world, and if you ever took a class on regression, you know of this as a potential overfitting of the input data.

I already brought up that we want to try and get generative AI to be able to generalize. Doing so will enable the generative AI to address matters that weren’t necessarily directly encountered when doing the initial data training. The issue at hand involves a juicy piece of terminology that you might enjoy, namely that we want the generative AI to cope with out-of-distribution (OOD) data.

Out-of-distribution data often refers to encountering some new data during the time that the generative AI is perhaps being used and has been put into active production. A person enters a question or topic that was never especially encompassed by the initial data training. What happens then? The generative AI might not be able to respond and therefore is usually coded to tell you that it doesn’t have anything notable to say on the matter. In other cases, as I indicated earlier, the generative AI might land on an AI hallucination and concoct something odd as an answer.

You might be tempted to insist that the initial data training ought to be wider in vastness to make sure that everything of any conceivable possibility can be encompassed. That is a nice dream but not really a fulfilling solution. The odds are that one way or another, something new will pop up after the initial data training has been completed or that the pattern-matching will potentially do an irksome narrow job at the get-go.

With that in mind, we can try to deal with things a bit more downstream.

When the generative AI is being tested, perhaps we can help the underlying constructs to aim toward fuller generalization. The same could be said once the generative AI is rolled out into the overall release. For now, I’ll focus on the test-time circumstances.

In a research paper entitled “Path Independent Equilibrium Models Can Better Exploit Test-Time Computation”, authored by Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse, and posted online on November 18, 2022, the role of test-time computation to come to grips with OOD and a desire for attaining generalization is stated this way (excerpted):

  • “One of the main challenges limiting the practical applicability of modern deep learning systems is the ability to generalize outside the training distribution. One particularly important type of out-of-distribution (OOD) generalization is upwards generalization, or the ability to generalize to more difficult problem instances than those encountered at training time. Often, good performance on more difficult instances will require a larger amount of test-time computation, so a natural question arises: how can we design neural net architectures that can reliably exploit additional test-time computation to achieve better accuracy?”

The goal advocated is to explore whether we can get the underlying artificial neural network to generalize in the direction of being able to solve problems that are harder than the ones encountered at the initial training time, by doing so at test time. In short, if we can give a model more test-time computation, could we potentially boost the generalizability in a considered upward problem-solving fashion?

Think back to the math problem about the two planes. I already mentioned that the generative AI might not have generalized sufficiently to solve the second problem after having seen the first problem during the initial data training. Let’s make things more challenging. Suppose that we have a math problem involving twenty planes flying from multiple locations and have to figure out when they all cross each other. You could assert that this is a harder problem. Assuming that no such problem perchance arose at training time, we are maybe up a creek without a paddle on having the AI solve it.

You can possibly use test-time computation and make systematic test-time adaptations to improve the underlying artificial neural network. In a research paper entitled “On Pitfalls of Test-Time Adaptation” by Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin, posted online on June 6, 2023, they describe the advantages of making use of test-time adaptations (excerpted):

  • “Tackling the robustness issue under distribution shifts is one of the most pressing challenges in machine learning. Among existing approaches, Test-Time Adaptation (TTA)—in which neural network models are adapted to new distributions by making use of unlabeled examples at test time—has emerged as a promising paradigm of growing popularity.”
  • “Compared to other approaches, TTA offers two key advantages: (i) generality: TTA does not rest on strong assumptions regarding the structures of distribution shifts, which is often the case with Domain Generalization (DG) methods; (ii) flexibility: TTA does not require the co-existence of training and test data, a prerequisite of the Domain Adaptation (DA) approach.”

Empirical studies on this topic are usually accompanied by trying out proposed methods that might show promising results. At times, the test-time adaptations might focus on changing the parameters of the model including using probabilities of uncertainty and optimization techniques. For example, in a research paper entitled “Test-time Adaptation for Machine Translation Evaluation by Uncertainty Minimization” by Runzhe Zhan, Xuebo Liu, Derek F. Wong, Cuilian Zhang, Lidia S. Chao, Min Zhang, published in the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, July 9-14, 2023, they made these points (excerpted):

  • “Our proposed method comprises three steps: uncertainty estimation, test-time adaptation, and inference. Specifically, the model employs the prediction uncertainty of the current data as a signal to update a small fraction of parameters during test time and subsequently refine the prediction through optimization.”
  • “The results obtained from both in-domain and out-of-distribution evaluations consistently demonstrate improvements in correlation performance across different models. Furthermore, we provide evidence that the proposed method effectively reduces model uncertainty.”

I don’t want this discussion to get bogged down and become exceedingly lengthy so I will conclude this third clue with a summarizing comment.

It is conceivable that Q* might refer to the use of Q-learning that has been avidly adapted, including that some form of test-time computation or test-time adaptations have been used. If we are to imagine that Q* has been able to attain heightened levels of generalizability of algorithmic problem solving, there is a chance that TTC or TTA might have contributed to that presumed AI breakthrough.

Don’t know if that’s the case but it is a superb tie-in of why test-time computation might be a third clue.

That’s hardcore detective work.

Conclusion

Sherlock Holmes is about to go off the clock and pursue other riddles and mysterious puzzles. The three clues used here were certainly vibrant food for thought. We had in a sense the candlestick, the butler, and the dining room as our Clue clues, kind of.

Maybe they add up, maybe they don’t.

One aspect that I believe might be equally relevant is that if any of the conjecture and speculation is of substance, another take on the matter is that perhaps we are beginning to see the intermixing of the data-based approach to AI with the rules-based approach to AI. I have previously noted that I believe we are going to need to enter into an era of neuro-symbolic AI to move things forward to the next level of AI capabilities, see my discussion at the link here.

In brief, we used to have the opinion that rules would be a means to devise AI. This was the likes of expert systems, rule-based systems, and knowledge-based systems. You would get people to divulge the rules they use to undertake tasks. Those rules would be entered or codified into an AI app. Sometimes this worked out well. At times, the approach was overly brittle and excessively time-consuming to devise.

The gloomy AI winter ensued.

Nowadays, the use of a data-based approach such as artificial neural networks is the hero and mainstay of modern AI. We are supposedly in the AI spring. Some assert that if we just keep increasing the size and scale, this will allow the existing approach to arrive at AGI. Others are doubtful of this. They tend to believe that we need to find some other approach, perhaps allied with the data-based approach.

Whenever they say this, the data-based converts will decry that things will go backward into the older and disdained ways if rules are allowed back into the game. The battle going on is one that has been ongoing for a long time in the AI field. There are the rules-focused proponents, known as the symbolics because they believe that we need to symbolically encode AI. The data-based proponents are typically known as the sub-symbolics since they are at the ground level of data and are said to be less enamored of the symbolics level as an approach.

Neuro-symbolic proponents contend that we can combine the symbolic and sub-symbolic, doing so to get the best of both worlds. You could somewhat compellingly exhort that if you used Q-learning and combined it in the ways I’ve above described, including immersing seamlessly into LLMs and GenAI, the conglomeration seems to blend the sub-symbolics and the symbolics together, to some extent.

Is that the necessary or at least a viable path to AGI?

Nobody can say for sure.

A few final comments for now.

The mainstream news has reported that at the recent Asia-Pacific Economic Cooperation (APEC) meetings in San Franciso, Sam Altman purportedly said this: "Four times now in the history of OpenAI, the most recent time was just in the last couple weeks, I've gotten to be in the room when we sort of push the veil of ignorance back and the frontier of discovery forward, and getting to do that is the professional honor of a lifetime.”

What did he witness in that room?

What exactly was it that so poetically and notably pushed away a stated veil of ignorance and shined a bright light on the frontier of forward discovery?

And, does the above detective gruntwork provide any forward discovery about what might have been put on display?

Sherlock Holmes famously said this: “How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?”

As a last word for now, Sherlock also said this: “The game is afoot.”

Follow me on Twitter

Join The Conversation

Comments 

One Community. Many Voices. Create a free account to share your thoughts. 

Read our community guidelines .

Forbes Community Guidelines

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's Terms of Service.  We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain:

  • False or intentionally out-of-context or misleading information
  • Spam
  • Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind
  • Attacks on the identity of other commenters or the article's author
  • Content that otherwise violates our site's terms.

User accounts will be blocked if we notice or believe that users are engaged in:

  • Continuous attempts to re-post comments that have been previously moderated/rejected
  • Racist, sexist, homophobic or other discriminatory comments
  • Attempts or tactics that put the site security at risk
  • Actions that otherwise violate our site's terms.

So, how can you be a power user?

  • Stay on topic and share your insights
  • Feel free to be clear and thoughtful to get your point across
  • ‘Like’ or ‘Dislike’ to show your point of view.
  • Protect your community.
  • Use the report tool to alert us when someone breaks the rules.

Thanks for reading our community guidelines. Please read the full list of posting rules found in our site's Terms of Service.