Login
You're viewing the mstdn.social public feed.
  • Jun 25, 2026, 11:00 AM

    @bweller @lnicola @nixCraft Even if you don't agree with @lnicola, it's ok to keep a civilized tone. Yes, he made a strong statement that he hasn't backed (that mixing scraping and distilling is bad reporting). I'm still waiting for him to say exactly why the difference matters in this context.

    💬 1🔄 0⭐ 0

Replies

  • Jun 25, 2026, 11:40 AM

    @alexmu @nixCraft Not sure if you saw my whole thread, but scrapping is bad mainly because of three reasons: 1. copyright concerns of the website owners, 2. unauthorized access to insecure web apps, 3. CPU and bandwidth consumption issues, especially with less optimized apps like Forgejo.

    1 doesn't apply because Anthropic holds no copyright. 2 is not an issue because it's all authorized. 3 doesn't apply because you can distill a model at very reasonable rate limits.

    1/3

    💬 1🔄 0⭐ 0
  • Jun 25, 2026, 12:50 PM

    @lnicola @nixCraft While technically (almost) true, the difference doesn't seem relevant in this context. The caveat is because you seem to imply that scraping inherently breaks copyright laws, which it does not. I point you to google, if you have any doubts.

    Sure, saying scraping when the original source used distilling is sloppy. But that doesn't make it "bad journalism".

    💬 1🔄 0⭐ 0
  • Jun 25, 2026, 1:28 PM

    @alexmu @nixCraft I don't really see your point. Anthropic did not mention scrapping, so why would you, as a journalist, bring it up instead of using the correct term.

    It's like bringing up your neighbour's dog that bit you when reporting on an article about a new cat disease. Yes, cats and dogs can be pets, but there's no closer relationship.

    And if you think web scrapping is legal and almost harmless, may I refer you to all the complaints about "AI scraperd"?

    💬 1🔄 0⭐ 0
  • Jun 25, 2026, 1:43 PM

    @lnicola @nixCraft There's many ways of scraping. Just because llm companies have aggressive scrapers that disregard robots.txt and don't throttle requests, doesn't mean all scrapers are badly behaved. But generalising from "people complain about llm scrapers" (rightly so) to "all scraping is bad" (I think you may have implied that all scraping breaks copyright as well, which is a non sequitur) is just as sloppy as mixing up scraping with distilling

    💬 1🔄 0⭐ 0
  • Jun 25, 2026, 2:03 PM

    @alexmu @nixCraft I did not say it's illegal or against copyright, Like Cloudflare puts it, "content theft", "degraded site performance" (cloudflare.com/learning/ai/how). It also "wastes application resources, skews analytics, compromises user accounts, and forces developers to build and maintain brittle, custom security logic" (cloudflare.com/products/bot-mi). I think they're valid concerns, regardless of legality.

    None of these applies to Anthropic, so can you explain why scrapping is relevant at all?

    💬 1🔄 0⭐ 0
  • Jun 25, 2026, 2:53 PM

    @lnicola @nixCraft It's not. You made a point of the difference. You could have ignored the sloppy wording. But you chose to have a go at what you perceived as a post biased against llms.

    💬 1🔄 0⭐ 0
  • Jun 25, 2026, 3:10 PM

    @alexmu @nixCraft Where does bias come into this? If it reported that Anthropic claims that usage indistinguishable from legitimate as an "attack", would it be biased for or against LLMs?

    💬 0🔄 0⭐ 0