This is a list of questions I have about AI inspired by conversations with technologists, founders, investors, and researchers. It is also inspired by other lists that I find interesting. It is an evergreen list that I'll continue to update as I learn more about AI - it is very much a list of questions - not answers. Please keep this in mind while you're reading!
If you have thoughts about any of these questions, I would love to hear them.
GPT-4, Opus and similar models were trained on vast amounts of the world’s public data. What percentage of all usable data does their training set amount to? My intuition says low-double-digit percentage points. Another question is how much high-quality data exists in the world; my suspicion is GPT-4’s training set is a significantly larger percentage of that number. What are the major untapped data silos? Is somebody keeping a list of them somewhere? Examples:
Generating useful synthetic data using simulation engines has shown promising results in specific domains (thank you Waymo). Extending this approach to foundation models could be worth exploring, especially as multi-modal data becomes more important.
There are several challenges to consider:
There are also many potential benefits:
Many smart people disagree that this will be useful. I will leave it to those building the models and datasets who have a more informed opinion than me about this topic to decide, but it feels like it would be a useful experiment to run if we have good enough simulation engines that are easily accessible.
How far would a $10Bn training run get us? How would that model compare to GPT-4/Opus? Scaling compute is an interesting question because if it truly helps us 100x from here, we should probably do it. Will GPT-7 be a $10Bn training run backed by sophisticated world models? Once we get to that scale, what are the emergent capabilities we will see in models?
There are holes to poke in this argument, but it can’t be denied that we have seen models grow tremendously even in scaling from GPT-3.5 → GPT-4o/Opus. Scaling compute 100x and seeing what other emergent capabilities these models exhibit is going to be fascinating.
* It's important to consider the potential limitations with scaling efforts. There may be diminishing returns as we push the boundaries of compute, and other bottlenecks, such as data availability and quality, could become more prominent.
Llama3 8B is nearly as powerful as Llama2 70B. There is a Moore’s Law style argument to be had here around model capabilities per iteration of model. Llama3 8B being close to Llama2 70B is the perfect example. In 2/3 model iterations, will Llama5 4B or Phi-5 1.5B be as powerful as, for example, Llama3 70B? Where does this cap out? What does this mean for the types of companies you could build if you had a local-first, privacy focused small model on your phone or laptop that was actually quite capable?
Traditional transformers scale quadratically, which leads to diminishing returns per dollar spent as you scale compute. Transformer architecture likely still has a long way to go before something replaces it, but approaches like RWKV have shown that sub-quadratic architectures can work on smaller models at least (7B). What would a 70B or 200B RWKV model look like compared to other transformer models?
Are there chip architectures that scale better than a standard H100 by being specifically focused on transformers? Etched and many others are going after this market. As far as I know none have been tested for a large training run yet. It will be curious to see that happen!
Many smarter people have asked this question and answered it more elegantly. I personally like the following framework:
This framework is the furthest thing from perfect and maps AGI far too closely to human consciousness. Maybe the right answer is their version of consciousness is far different from that of any human’s or biological species. I like to think of cats and dogs as examples: they have some level of consciousness - they build relationships, care more for their owners than for other humans, have emotional intelligence. Yet, they are far-less cognitively powerful than humans. Their forms of consciousness are different than a human's, yet in a Venn Diagram, have strange overlaps around care, love, empathy.
In the aftermath of previous revolutions like the agricultural revolution & industrial revolution, humans found themselves with more free time as new technologies automated tasks and increased efficiency. But the transition was often bumpy, with social and economic disruption as people adapted to the new reality. This is going to be no different when it comes to AI, and perhaps it might be a much less smooth transition than previous revolutions.
I have no doubt that there will be a steady-state where humans find better ways to spend their time thanks to this technological shift, but it feels like the transition period from pre-to-post AGI is going to be bumpy for most involved.
It seems likely that major nations will develop (or acquire, or fund) their own foundation models for a few reasons:
That said, developing and maintaining SOTA foundation models is more than challenging. This may lead to some nations partnering together or smaller nations relying on models developed by larger powers or multinational corporations. The geopolitical landscape of foundation models will likely be shaped by a complex interplay of technological, economic, and strategic factors, with nations weighing the benefits and costs of pursuing their own models versus relying on third-party models. Leading to my next question.
If models truly get as powerful as some people expect they might, will a large country for a Manhattan Project for AGI? As far as I know, everybody building towards AGI right now is a company whose primary goal is to become a larger, more impactful company. No matter what the company mission is, they are still private companies with goals and ambitions that don’t align with the goals/ambitions of a government. Will this eventually lead to a Manhattan project where, for example, the US government joins the race for AGI? In that world, who would play Vannevar Bush?
If OpenAI or similar unlocks AGI, especially before other competitors, how powerful will they become as an organization? How will they be regulated? What would compel them to share it with the world on an even playing field?
Google, Meta, Tesla, Amazon, and many other large companies are benefiting tremendously from generative AI. How much of the new markets will be dominated by companies that have already solved distribution vs startups? I think a tremendous amount of new startups will be built around generative AI, but couldn’t give a guess as to the percentage of the new GDP that they will capture vs incumbents.
AI friends, girlfriends, boyfriends are going to impact our real life relationships when they become more normal. Imagine even talking with passed away loved ones - at some point, we’ll be able to (re)create any person, character we want and interact with them instead of real humans. Per this tweet, this is already happening.
Describe what you want to LLM → render 3D Model with interactive demo → 3D print is a workflow I’d be very excited about. Maybe this means there should be a consumer-ish AutoDesk that lets people create and edit their 3D models, assuming a world where models are good enough to generate accurate 3D models based on text/image/video inputs.
AIs will increase internet usage 1000x. Can we handle that load?
QNNs could lead to exponential representation, quantum parallelism, enhanced optimization, quantum-inspired architectures, and the ability to solve complex problems more efficiently than classical neural networks [source].
Will this ever be useful? Quantum computers are so far away from useful that it’s hard to imagine this being useful in the next 10 years. I would suspect that by then, classical computing, new model architectures etc. will have evolved so much that the conversation will be different.
The path to quantum transformers
Companies like Together, Fireworks and co are racing to compete on cost-per-token, tokens-per-second etc. and mainly leveraging software innovations to do so (although Groq are more hardware focused). Will there be an AI native cloud provider, once all of this is mainstream? I struggle to see large public companies choosing vertical clouds over bundled solutions like Azure. I’m not debating whether they should exist or not, I’m more questioning how big they will be, and how they differentiate at scale.
Comparing this to the Renaissance - who is the equivalent of Cosimo de' Medici? What are their goals? Musk, Zuck, Altman are three examples of people driving this forward who likely all have wildly different goals and success-states that they'd be happy with.