Big company problems, GenAI data moats and…

Apr 7, 2024

7 April 2024 | Issue #20 - Mentions $GOOG, $META, OpenAI, Perplexity, $TSLA

4 Comments

Apr 8, 2024

I don't buy the data moats in AI as much as everyone else. Not all data is the same, the quality of data matters a lot - ex: textbooks vs twitter vs facebook text. You can learn a lot more "useful" things from textbooks vs random text on facebook. So, I don't think the data in TikTok or facebook is very useful.

Curating data has been a big thing (Ex: TextBooks are all you need https://arxiv.org/abs/2306.11644) and so is synthetic data Ex: train on a video game to learn physics, or train on LLM outputs in clever ways (Ex: Q*). Don't think facebook/instagram/tiktok/nytimes data is anywhere as useful as people claim to be.

Expand full comment

Reply (1)

Portsea Capital

Apr 8, 2024

That’s a fair point and makes sense. I still think having proprietary data is valuable, otherwise Google wouldn’t have paid Reddit for its content. What do you think are more defendable moats in LLMs?

Expand full comment

Reply (1)

patcap

Apr 9, 2024

Reddit i think is actually useful information for a lot of user queries. Same with stack overflow. Both are not large in token size, but more useful in content.

I think the moat at this point is in execution - consistently delivering useful improvements over time - like open ai has. But it's not a static moat since we've seen competitors like claude catch up so quickly. NVDA seems to be the only one with any real moat in AI.

Expand full comment

Mark McGuire

May 2, 2024

Hey, I just took over your friend’s phone. I can see from your recent texts that you have a problem. How can I help? I think this is the beginning of a beautiful friendship.

Expand full comment

Tech takes from the cheap seats

Big company problems, GenAI data moats and…