I don't buy the data moats in AI as much as everyone else. Not all data is the same, the quality of data matters a lot - ex: textbooks vs twitter vs facebook text. You can learn a lot more "useful" things from textbooks vs random text on facebook. So, I don't think the data in TikTok or facebook is very useful.
Curating data has been a big thing (Ex: TextBooks are all you need https://arxiv.org/abs/2306.11644) and so is synthetic data Ex: train on a video game to learn physics, or train on LLM outputs in clever ways (Ex: Q*). Don't think facebook/instagram/tiktok/nytimes data is anywhere as useful as people claim to be.
That’s a fair point and makes sense. I still think having proprietary data is valuable, otherwise Google wouldn’t have paid Reddit for its content. What do you think are more defendable moats in LLMs?
Reddit i think is actually useful information for a lot of user queries. Same with stack overflow. Both are not large in token size, but more useful in content.
I think the moat at this point is in execution - consistently delivering useful improvements over time - like open ai has. But it's not a static moat since we've seen competitors like claude catch up so quickly. NVDA seems to be the only one with any real moat in AI.
Hey, I just took over your friend’s phone. I can see from your recent texts that you have a problem. How can I help? I think this is the beginning of a beautiful friendship.
I don't buy the data moats in AI as much as everyone else. Not all data is the same, the quality of data matters a lot - ex: textbooks vs twitter vs facebook text. You can learn a lot more "useful" things from textbooks vs random text on facebook. So, I don't think the data in TikTok or facebook is very useful.
Curating data has been a big thing (Ex: TextBooks are all you need https://arxiv.org/abs/2306.11644) and so is synthetic data Ex: train on a video game to learn physics, or train on LLM outputs in clever ways (Ex: Q*). Don't think facebook/instagram/tiktok/nytimes data is anywhere as useful as people claim to be.
That’s a fair point and makes sense. I still think having proprietary data is valuable, otherwise Google wouldn’t have paid Reddit for its content. What do you think are more defendable moats in LLMs?
Reddit i think is actually useful information for a lot of user queries. Same with stack overflow. Both are not large in token size, but more useful in content.
I think the moat at this point is in execution - consistently delivering useful improvements over time - like open ai has. But it's not a static moat since we've seen competitors like claude catch up so quickly. NVDA seems to be the only one with any real moat in AI.
Hey, I just took over your friend’s phone. I can see from your recent texts that you have a problem. How can I help? I think this is the beginning of a beautiful friendship.