Reading list#

Following is an opinionated list of resources interconnected to digital technicalities (see homepage), ranging from computational linguistics to AI, from philosophy of science to retrogaming, from digital humanities to cybersecurity, and more. Updated several times a week, it contains both the original link and a preservation link (served by the Wayback Machine). Resources can be sorted by publication date (retrieved through Wallabag, or htmldate if the former fails; set to N/A if both scrapers fails to extract it from the HTML source), or the date the resource was added to the list. Some materials may be behind a paywall (and the archived copy may consequently be unavailable); in such cases you may try using the browser extension Bypass Paywalls Clean available for both Firefox and Chrome.

The list may be downloaded in .tsv format from here.

Last update: 16/01/2025

Link

Published

Added

Archived

Misinformation: A Flawed Concept

20241028

20241118

When Machine Learning Tells the Wrong Story

20241109

20241118

Liberation technology: dreams, politics, history

N/A

20241118

Punctuation is dead because the iPhone keyboard killed it

20241110

20241118

Network Of Time

20190101

20241118

Visualizing 13 million BlueSky users

20241112

20241118

How a stubborn computer scientist accidentally launched the deep learning boom

20241111

20241118

Meta Horizon Worlds Has Been Taken Over by Children

20241112

20241118

AI Chatbot Added to Mushroom Foraging Facebook Group Immediately Gives Tips for Cooking Dangerous Mushroom

20241112

20241118

Graph-based AI model maps the future of innovation

20241112

20241118

After Trump’s Victory, the 4B Movement Is Spreading Across TikTok

20241107

20241118

The Open Source Project DeFlock Is Mapping License Plate Surveillance Cameras All Over the World

20241111

20241118

Guardian will no longer post on Elon Musk’s X from its official accounts

20241113

20241118

The AI lab waging a guerrilla war over exploitative AI

20241113

20241118

Our brains are vector databases — here’s why that’s helpful when using AI

20241116

20241118

ChatGPT is Slipping

20241117

20241118

N/A

Super Weights in LLMs - How Pruning Them Destroys a LLM’s Ability to Generate Text ?

20241118

20241118

Drop #9. UDO: The Weird Magic of Digital Folkore

20240119

20241118

AI and Ways of Seeing: Q&A with Lauren Tilton

20241112

20241118

AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably - Scientific Reports

20241114

20241118

N/A

OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI

None

20241118

N/A

Something weird is happening with LLMs and chess

20241114

20241118

The metaphors of artificial intelligence

N/A

20241118

When ads shock: subtle ways that disgust can shape our buying habits

N/A

20241118

Academic papers retracted due to … software licensing?

20241114

20241118

The ambiguous “use” / GioCities

20241115

20241118

How AI Could Break the Career Ladder

None

20241118

The New Hatred of Technology

20241115

20241118

Despite its impressive output, generative AI doesn’t have a coherent understanding of the world, researchers suggest

20241105

20241118

Misinformation really does spread like a virus, suggest mathematical models drawn from epidemiology

N/A

20241118

Sustainable Web Interest Group is Formed

20241104

20241118

Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan) (Ep. 79)

20190923

20241118

Exploring Internet traffic shifts and cyber attacks during the 2024 US election

20241106

20241118

how a neuron learns

20240202

20241118

The images of Spain’s floods weren’t created by AI. The trouble is, people think they were

20241109

20241118

OpenAI’s new “Orion” model reportedly shows small gains over GPT-4

20241110

20241118

Teens learn a new conspiracy theory every week on social media, yet most schools aren’t teaching media literacy

N/A

20241118

IMG_0416

20241103

20241118

Anthropic hires its first “AI welfare” researcher

20241111

20241118

Apple AI notification summaries exist; rarely useful, often hilarious

20241112

20241118

Patterns in Information - Lachlan Gray

None

20241118

The Beginner’s Guide to Visual Prompt Injections: Invisibility Cloaks, Cannibalistic Adverts, and Robot Women / Lakera – Protecting AI teams that disrupt the world.

20241113

20241118

Releasing the largest multilingual open pretraining dataset

20241113

20241118

AI has a stupid secret: we’re still not sure how to test for human levels of intelligence

20241004

20241118

The Commoditization of LLMs – Communications of the ACM

20240912

20241118

AI search could break the web

20241031

20241118

The Internet Archive is even more essential than I realized

N/A

20241118

The Fairness of Fact-checking and Its Impact on Social Media / TechPolicy.Press

20241104

20241118

Generative AI Has a Massive E-Waste Problem

20241104

20241118

The Third-Party Script Breach That Shook The World

20201016

20241118

Seeing Like a Programmer (LambdaConf 2024) — Sympolymathesy, by Chris Krycho

20240507

20241118

AI overwhelmingly prefers white and male job candidates in new test of resume-screening bias

N/A

20241118

Despite its impressive output, generative AI doesn’t have a coherent understanding of the world

20241105

20241118

Why the deep learning boom caught almost everyone by surprise

20241105

20241118

A global dataset of 7 billion individuals with socio-economic characteristics - Scientific Data

20241007

20241118

No, The Web Is Not Dead

20240523

20241118

AI is consolidating corporate power in higher ed (opinion)

20241106

20241118

Google admits massive document leak related to search algorithm is authentic

20240530

20241118

For fame or a death wish? Kids’ TikTok challenge injuries stump psychiatrists

20241106

20241118

Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’

20240919

20241118

NaNoWriMo Says Condemning AI Is ‘Classist and Ableist’

20240902

20241118

You Can Now See the Code That Helped End Apartheid

20241018

20241118

It’s time to retire the term “user”

20240419

20241118

Everything we know about ‘shadowbans’ on social media

N/A

20241118

Planes, trains, and smartphones

20241015

20241118

Are LLMs Any Good at Ranking People? – Wilsons Blog

20241018

20241118

AI could help people find common ground during deliberations

20241017

20241118

AI art: The end of creativity or the start of a new movement?

20241021

20241118

How AI is generating a ‘sea of sameness’ in job applications

20240908

20241118

You Should Probably Pay Attention to Tokenizers

20241021

20241118

Chatbot that caused teen’s suicide is now more dangerous for kids, lawsuit says

20241023

20241118

Former OpenAI Researcher Says Company Broke Copyright Law

20241023

20241118

Feds Say You Don’t Have a Right to Check Out Retro Video Games Like Library Books

20241025

20241118

Thoughts on the New Digital Feudalism

20241026

20241118

Open Source on its own is no alternative to Big Tech - Bert Hubert’s writings

20241026

20241118

N/A

Inside the U.S. Government-Bought Tool That Can Track Phones at Abortion Clinics

20241023

20241118

Instagram saves the best video quality for the most popular content

20241027

20241118

The Open Source AI Definition – 1.0

N/A

20241118

OpenAI says ChatGPT treats us all the same (most of the time)

20241015

20241118

N/A

Software freedom isn’t about licenses – it’s about power.

20210328

20241118

LinkedIn launches its first AI agent to take on the role of job recruiters / TechCrunch

20241029

20241118

Make It Ephemeral: Software Should Decay and Lose Data

20241030

20241118

Generative AI as an Icebreaker to Help Us Accept Other Ways of Thinking – Communications of the ACM

20241030

20241118

Embeddings are underrated

20241021

20241118

wrestling the web from corporate control requires making it boring again

20000101

20241118

How Everyone Got Lost in Netflix’s Endless Library

N/A

20241118

When Data Is Missing, Scientists Guess. Then Guess Again. / Quanta Magazine

20241002

20241118

Beyond the link tax: journalism and the changing nature of the internet - Halifax Examiner

20240917

20241118

Is big tech harming society? To find out, we need research – but it’s being manipulated by big tech itself

N/A

20241118

Man learns he’s being dumped via “dystopian” AI summary of texts

20241010

20241118

Open-source AI definition finally gets its first release candidate - and a compromise

20241009

20241118

Are humans the only ones that can be creative?

20241010

20241118

Cyber resilience act: Council adopts new law on security requirements for digital products

N/A

20241118

N/A

Amazon Dreams of AI Agents That Do the Shopping for You

20241009

20241118

FEMA adds misinformation to its list of disasters to clean up

20241008

20241118

TikTok executives know about app’s effect on teens, lawsuit documents allege

20241011

20241118

Maithra Raghu / The best AIs will be constructed not emergent

20240925

20241118

This AI Pioneer Thinks AI Is Dumber Than a Cat

N/A

20241118

The Editors Protecting Wikipedia from AI Hoaxes

20241009

20241118

Lessons from Plain Text / rugu

20241013

20241118

Stop aggregating away the signal in your data

20220303

20241118

People are using Google study software to make AI podcasts—and they’re weird and amazing

20241003

20241118

Tech Innovations to make the Tibetan Language a First-class Citizen in the Digital World - Buddhist Digital Resource Center

20241010

20241118

N/A

Reasoning failures highlighted by Apple research on LLMs

20241012

20241118

AI is the new plastic

20241002

20241118

What is Code?

None

20241118

N/A

AI-Powered Social Media Manipulation App Promises to ‘Shape Reality’

20241016

20241118

AI Avatars Are Doing Job Interviews Now

20240927

20241118

Google Serving AI-Generated Images of Mushrooms Could Have ‘Devastating Consequences’

20240924

20241118

RAG is not just text

20240928

20241118

An A.I. Model Helped Uncover 303 Previously Unseen Nazca Lines in Peru

20240927

20241118

If your AI seems smarter​, it’s thanks to smarter human trainers

N/A

20241118

AI and globalisation are shaking up software developers’ world

N/A

20241118

New study reveals positive mood changes during video game play

20240925

20241118

How ‘Embeddings’ Encode What Words Mean — Sort Of / Quanta Magazine

20240918

20241118

Vanishing Culture: Preserving Cookbooks / Internet Archive Blogs

20240930

20241118

A.I. Pioneers Call for Protections Against ‘Catastrophic Risks’

20240916

20241118

The Modern CLI Renaissance

20240904

20241118

The 1970s librarians who revolutionised the challenge of search / Aeon Essays

20230605

20241118

Delving into “delve”

20240331

20241118

On Opting Out of Copyright

20240429

20241118

We’re losing our digital history. Can the Internet Archive save it?

20240916

20241118

Copyright Keepers Just Destroyed a Huge Digital Library

20240920

20241118

Technical writing is too important to leave to language models

20240709

20241118

Cold war spy satellites and AI detect ancient underground aqueducts

20240916

20241118

The Age of Software Artisans

None

20241118

Algorithms for the 21st Century

20060101

20241118

“Dead Internet theory” comes to life with new AI-powered social media app

20240918

20241118

The continuing tragedy of emoji on the web

20240917

20241118

Chatbots in science: What can ChatGPT do for you?

20240814

20241118

N/A

AI tool that can do ‘81 years of detective work in 30 hours’ trialled by police

20240923

20241118

Holy Hell, The Social Web Did Not Begin In 2008 - Bix Dot Blog

N/A

20241118

Please Don’t Ask AI If Something Is Poisonous

20240925

20241118

Google’s NotebookLM can help you dive deeper into YouTube videos

20240926

20241118

Project Overview ‹ AI-Implanted False Memories – MIT Media Lab

20240831

20241118

Greppability is an underrated code metric

20240829

20241118

Copyright Is Not a Tool to Silence Critics of Religious Education

20240828

20241118

The Imperial Origins of Big Data - Yale University Press

20240828

20241118

The /llms.txt file – llms-txt

20240903

20241118

Disappearing web and what to do about it.

20240816

20241118

Turn Your Code Into Pixel Art

20240102

20241118

US, Britain, EU to sign first international AI treaty

N/A

20241118

New AI model “learns” how to simulate Super Mario Bros. from video footage

20240905

20241118

GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation / HKS Misinformation Review

20240903

20241118

Google is losing its status as a verb

20240906

20241118

Bitten by Unicode – pyATL

20240901

20241118

LLMs produce racist output when prompted in African American English

20240828

20241118

Jeremy Couillard’s video games capture what it’s like to be alive right now

20240209

20241118

ON ALGORITHMIC WAGE DISCRIMINATION - Columbia Law Review

20231120

20241118

Inrupt, Tim Berners-Lee’s Solid, and Me

20200824

20241118

Can You Trust Dr. Wikipedia?

20240906

20241118

An AI Bot Named James Has My Old Local News Job

N/A

20241118

Facebook admits to scraping every Australian adult user’s public photos and posts to train AI, with no opt-out option

20240910

20241118

The Network is the Territory

20240908

20241118

Grounding AI in reality with a little help from Data Commons

20241111

20241118

Did ChatGPT just message me… First?

N/A

20241118

The Tao of Unicode Sparklines

20210805

20241118

Move over, text: Video is the new medium of our lives

20240824

20241118

AI and the future of sex

20240826

20241118

Here’s how people are actually using AI

20240812

20241118

We need to prepare for ‘addictive intelligence’

20240805

20241118

A new public database lists all the ways AI could go wrong

20240814

20241118

The race to save our online lives from a digital dark age

20240819

20241118

How gamification took over the world

20240613

20241118

Algorithms are everywhere

20240227

20241118

Wikimedia’s CTO: In the age of AI, human contributors still matter

20240226

20241118

The online art catalogue that chronicles a stolen African heritage

20240104

20241118

Recapturing early-internet whimsy with HTML

20231221

20241118

The grassroots push to digitize India’s most precious documents

20231025

20241118

Stephen Wolfram thinks we need philosophers working on big questions around AI / TechCrunch

20240825

20241118

The Psychology of Immersion in Video Games

20100728

20241118

Research shows more than 80% of AI projects fail, wasting billions of dollars in capital and resources: Report / Tom’s Hardware

20240828

20241118

When It Comes to Artificial Intelligence, ‘Big Data’ Isn’t Everything

20240828

20241118

I spent an evening on a fictitious web

20240828

20241118

How to build a terrible RAG system - jxnl.co

20240107

20241118

Under Meredith Whittaker, Signal Is Out to Prove Surveillance Capitalism Wrong

20240828

20241118

Rearchiving 2 million hours of digital radio, a comprehensive process

20240828

20241118

Rediscovering the Small Web - Neustadt.fr

20200525

20241118

Google Thinks Beethoven Looks Like Mr. Bean

20240830

20241118

A new way to build neural networks could make AI more understandable

20240830

20241118

Why A.I. Isn’t Going to Make Art

N/A

20241118

Chatbots Are Primed to Warp Reality

20240830

20241118

Political posts on X could harm academics’ credibility, new study finds

20240828

20241118

What we can learn from vintage computing

20221213

20241118

Artificial intelligence is losing hype

N/A

20241118

Against nostalgia in computing

None

20241118

No one’s ready for this

20240822

20241118

Was Linguistic A.I. Created by Accident?

N/A

20241118

A Short History of Glitch Art: From Inception to the Present Day

N/A

20241118

Facebook Banned Me for Life Because I Help People Use It Less

20211007

20241118

More than calculators: Why large language models threaten learning, teaching, and education

N/A

20241118

Olivetti Programma 101: at the origins of the Personal Computer / Inexhibit

20170212

20241118

Capt. Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (1982)

20240826

20241118

Inside the long quest to advance Chinese writing technology

20240826

20241118

What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives — Yi Tay

20240716

20241118

We need new metaphors that put life at the centre of biology / Aeon Essays

20240712

20241118

The Elegance of the ASCII Table

20240721

20241118

Data For The Ages, Take Two

20201024

20241118

Switzerland now requires all government software to be open source

20240729

20241118

The Data That Powers A.I. Is Disappearing Fast

20240719

20241118

Why AI Model Collapse Due to Self-Training Is a Growing Concern

20240724

20241118

Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning

N/A

20241118

Tiktok LLM

20240624

20241118

The bizarre secrets I found investigating corrupt Winamp skins

20240724

20241118

Open File format in data analytics and AI - changing the international rules game

20240727

20241118

Data from deleted GitHub repos may not really be deleted

20240725

20241118

Ethics of Local LLMs: A Response to Zuckerberg’s ‘’Open Source AI Manifesto’’

20240725

20241118

The Backlash Against AI Scraping Is Real and Measurable

20240723

20241118

New study on AI-assisted creativity reveals an interesting social dilemma

20240728

20241118

Reimagining the Semantic Web: UCL’s Innovative Synthesis of AI and Web Science - Browser London

20240729

20241118

How embedding models encode semantic meaning

20240803

20241118

To preserve their work — and drafts of history — journalists take archiving into their own hands

20240731

20241118

Free Software Needs Free Tools :: Benjamin Mako Hill

20100604

20241118

Has the AI bubble burst? Wall Street wonders if artificial intelligen…

20240804

20241118

Debates on the nature of artificial general intelligence / Science

20240701

20241118

Myspace celebrates its 21st birthday. Do we still need it? / TribLIVE…

20240806

20241118

N/A

The Great Open Source Shake-up

20190908

20241118

Google and Meta struck secret ads deal to target teenagers

20240808

20241118

Demo: Predicting social science experimental results using LLMs

N/A

20241118

AI and the techno-utopian path not taken

20240801

20241118

Is It Time To Version Observability? (Signs Point To Yes)

20240807

20241118

How Algorithms Keep Workers Under Their Control

20240805

20241118

Cannibal AIs Could Risk Digital ‘Mad Cow Disease’ Without Fresh Data

20240806

20241118

Excess memes and ‘reply all’ emails are bad for climate, researcher warns

20240809

20241118

Research AI model unexpectedly modified its own code to extend runtime

20240814

20241118

Code as Art

20240817

20241118

Markov chains are funnier than LLMs

20240818

20241118

What If Data Is a Bad Idea?

20240818

20241118

OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

20240719

20241118

LLMs Know More Than What They Say

20240815

20241118

Whatever Happened to the Semantic Web?

20180527

20241118

On the cruelty of really teaching computing science (EWD 1036)

20090512

20241118

The 𝕆ᗪ⒟𝙞ȶч of Unicode Homoglyphs

N/A

20241119

ChatGPT outperforms undergrads in intro-level courses, falls short later

20240628

20241119

The telltale words that could identify generative AI text

20240701

20241119

Study reveals why AI models that analyze medical images can be biased

20240628

20241119

What I’ve learned about Open Source community over 30 years - OpenSource.net

20240629

20241119

Design as Thought: AI and the Future of Design

20240608

20241119

Google: AI Potentially Breaking Reality Is a Feature Not a Bug

20240703

20241119

Free and Open Source Software–and Other Market Failures – Communications of the ACM

20240703

20241119

Ever put content on the web? Microsoft says that it’s okay for them to steal it because it’s ‘freeware.’

20240628

20241119

How Good Is ChatGPT at Coding, Really?

20240706

20241119

Closing the Gap in Non-Latin-Script Data: A tool for building and navigating collections of DH research projects

20230809

20241119

Scripts, Transliteration, and Computer Access

19970101

20241119

Vision language models are blind

20000101

20241119

Is AI the beginning of the democratization of creativity?

20241119

20241119

We need visual programming. No, not like that.

20240101

20241119

Google Now Defaults to Not Indexing Your Content

20240715

20241119

Exploring the vastness of a website — Elliott’s Computer

20190818

20241119

It May Soon Be Legal to Jailbreak AI to Expose How it Works

20240718

20241119

Want to spot a deepfake? Look for the stars in their eyes

20240717

20241119

What Is ChatGPT Doing … and Why Does It Work?

20230214

20241119

Large language model data pipelines and Common Crawl (WARC/WAT/WET)

20230604

20241119

All the Data on Earth Can Fit in a Cup Full of DNA. This Is MIT’s Jurassic Park-Inspired Project

20240618

20241119

AI’s Brain Drain

20240603

20241119

Toolkits for the Mind

20150402

20241119

Why your brain is 3 milion more times efficient than GPT-4 - dead simple introduction to Embeddings, HNSW, ANNS, Vector Databases and their comparison based on experience from production project

20230722

20241119

AI Is Already Wreaking Havoc on Global Power Systems

None

20241119

A Third-World Critique of the Human Rights-Based Approach to Content Moderation / TechPolicy.Press

20240623

20241119

Surfing the (Human-Made) Internet

20240528

20241119

Human neuroscience is entering a new era — it mustn’t forget its human dimension

20240619

20241119

What the internet looked like in 1994, according to 15 webpages born that year

N/A

20241119

Measuring the Growth of the Web

19950101

20241119

Could AI Achieve General Intelligence, and What Would That Even Mean?

20240625

20241119

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

20240625

20241119

Pokémon Go Players Have Unwittingly Trained AI to Navigate the World

20241119

20241123

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning - Scientific Reports

20241114

20241123

The macOS LC_COLLATE hunt - Zhiming Wang

20200603

20241123

down in the posting mines / poking at ghosts

20241122

20241123

How OpenAI stress-tests its large language models

20241121

20241123

For Teens Online, Conspiracy Theories Are Commonplace. Media Literacy Is Not. - EdSurge News

20241107

20241123

Remembering Cyberia, the World’s First Ever Cyber Cafe

20241121

20241123

Autopoietic Networks

20150201

20241124

The Fantasy of Cozy Tech

20241120

20241124

‘All of a Sudden, Joe Blow Can See the CEO’s Emails’

20241121

20241124

Understanding the EU AI Act’s Impact and Ripple Effects in the US

20241008

20241124

Creating a public counterpoint for AI / The Mozilla Blog

20241002

20241124

Emoji history: the missing years

20240510

20241124

How tech giants cut corners to harvest data for AI

20240406

20241124

Autism & the Internet will defeat the Monoculture

20240512

20241124

Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun

20240410

20241124

The British Library hack is a warning for all academic libraries

20240319

20241124

Transformers Are What You Do Not Need

N/A

20241124

The Website Obesity Crisis

20150720

20241124

Open Source Is at a Crossroads

20240507

20241124

The Small Web and Science

20240514

20241124

Using Simple Tools as a Radical Act of Independence

20241118

20241124

Why neural networks struggle with the Game of Life - TechTalks

20200916

20241124

How to Use GitHub Actions to Automate Data Scraping

N/A

20241124

State of Compute Access: How to Bridge the New Digital Divide

20231207

20241124

Indian Voters Are Being Bombarded With Millions of Deepfakes. Political Candidates Approve

20240520

20241124

When Online Content Disappears

20240517

20241124

You Don’t Own Your Content on the Internet. You Never Have.

20240521

20241124

Do text embeddings perfectly encode text?

20240305

20241124

A Brief Overview of Gender Bias in AI

20240408

20241124

An Introduction to the Problems of AI Consciousness

20230930

20241124

What Do LLMs Know About Linguistics? It Depends on How You Ask

20230709

20241124

Grounding Large Language Models in a Cognitive Foundation: How to Build Someone We Can Talk To

20230415

20241124

Large Language Model: world models or surface statistics?

20230121

20241124

Here’s what’s really going on inside an LLM’s neural network

20240522

20241124

Google Is Paying Reddit $60 Million for Fucksmith to Tell Its Users to Eat Glue

20240523

20241124

Meta is using your Instagram and Facebook photos to train its AI models

20240511

20241124

The Danger Of Superhuman AI Is Not What You Think / NOEMA

20240523

20241124

What Science Forgets

20240523

20241124

An Evolving Sixth Sense for AI

20240525

20241124

Facebook users say ‘amen’ to bizarre AI-generated images of Jesus

20240319

20241124

No, Today’s AI Isn’t Sentient. Here’s How We Know

20240522

20241124

To the brain, reading computer code is not the same as reading language

20201215

20241124

Partial Regurgitation and how LLMs really work

20240523

20241124

Big Data is Dead

20230207

20241124

N/A

What Comes After Open Source

20241024

20241124

N/A

How Many People Are Addicted to Social Media?

N/A

20241124

N/A

The next wave of AI hype will be geopolitical. You’re paying

20240529

20241124

Indexing all of Wikipedia, on a laptop

20240529

20241124

Engineering for Slow Internet – brr

20240530

20241124

Understanding Large Language Models – A Transformative Reading List

20230207

20241124

1-bit LLMs Could Solve AI’s Energy Demands

20240530

20241124

Tiny number of ‘supersharers’ spread the vast majority of fake news

N/A

20241124

FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW

20240523

20241124

What AI thinks a beautiful woman looks like

20240531

20241124

An Overview of the Textual Data Analysis Workflow

20210401

20241124

domm / Perl / Chopping UTF-8

20240604

20241124

How Online Privacy Is Like Fishing

20240603

20241124

The Backrooms of the Internet Archive

20240601

20241124

After Social Media

20200106

20241124

Inside LLMs: understanding tokens - Generative AI France

20240610

20241124

An Anonymous-Messaging App Upended This High School - WSJ

20240610

20241124

N/A

Researchers Say There’s a Vulgar But More Accurate Term for AI Hallucinations

20240610

20241124

We need a social science of data

20240612

20241124

N/A

Hacker Theory - Journal #146

20131222

20241124

Good code is rarely read

20240606

20241124

Lies, Damned Lies, and Data Science

N/A

20241124

Ghosts in the ROM

N/A

20241124

How we Chunk - turning PDF’s into hierarchical structure for RAG

N/A

20241124

Coding a Neural Network from Scratch for Absolute Beginners

N/A

20241124

Overcoming the limits of current LLM

20240718

20241124

Demystifying cookies and tokens – Tommi Hovi

20240502

20241124

Scrape like a pro… but not like an AI company

20240729

20241124

I investigated millions of tweets from the Kremlin’s ‘troll factory’ and discovered classic propaganda techniques reimagined for the social media age

N/A

20241124

Breaking out of VRChat using a Unity bug

20241123

20241124

LLMs Aren’t Just “Trained On the Internet” Anymore

20240531

20241124

What are embeddings?

None

20241124

The YouTube Algorithm and Manufacturing Consent

20241117

20241124

N/A

Engines of Engagement – A Curious Book About Generative AI

20231018

20241124

The Iterative Paraphrasing Experiment: How GenAI Morphs a Story Over 100 Rewrites

N/A

20241125

Writing around an AI taboo

20240306

20241125

Data centers powering artificial intelligence could use more electricity than entire cities

20241123

20241125

It’s Surprisingly Easy to Jailbreak LLM-Driven Robots

20241111

20241125

The WTF-8 encoding

20220223

20241125

Do Coding Boot Camps Make Sense in an A.I. World?

20241125

20241125

Documenting the Assault on Disinformation and Hate Speech Research / TechPolicy.Press

20241124

20241125

‘Thirsty’ ChatGPT uses four times more water than previously thought

20241004

20241125

A City Is Not a Computer

N/A

20241125

Our Transparent Future

N/A

20241125

Here’s a ‘Brand-New’ Massive Multilingual Dataset for Machine Translation

20240403

20241125

#youtubepick The Enshittification of Internet is Here - Why and How?

20240417

20241125

Programming Is Mostly Thinking

20140929

20241125

Self-Reasoning Tokens, teaching models to think ahead.

20240420

20241125

The Tyranny of Content Algorithms

20240407

20241125

Lost Language of the Machines

20200101

20241125

Torching the Modern-Day Library of Alexandria

20170420

20241125

Exploring Small Language Models

20240424

20241125

The Man Who Killed Google Search

20240423

20241125

Source Code With Emoji

20240424

20241125

What can we learn from ChatGPT jailbreaks?

20240819

20241125

The Person Saving The Media You Love Is You - Aftermath

20240426

20241125

ChatGPT provides false information about people, and OpenAI can’t correct it

20240429

20241125

Mistakes that data science students make

20240428

20241125

You can’t just assume UTF-8

20240429

20241125

Understanding Software – Ceejbot’s notes

20240329

20241125

LLMs Can’t Do Probability - Brainsteam

20240501

20241125

New EU rules needed to address digital addiction / News / European Parliament

20231212

20241125

93% of Paint Splatters are Valid Perl Programs

20230101

20241125

N/A

THE NATURE OF CODE

20210515

20241125

N/A

The User Is On Their Own / selfaware soup

20240501

20241125

N/A

Humans share the web equally with bots, report warns amid fears of ‘dead internet’

20240417

20241125

Machine Unlearning in 2024

20231227

20241125

Decoding UTF8 with Parallel Extract

20171006

20241125

How are Embeddings Affecting Traditional Text Search?

20240506

20241125

Add Bluetooth to the Long List of Border Surveillance Technologies

20240506

20241125

The Antisocial Network: How the 90s Internet Died Like Diaryland

20240729

20241125

40 years later, a game for the ZX Spectrum will be once again broadcast over FM radio - Računalniški muzej

20240508

20241125

OpenAI destroyed a trove of books used to train AI models. The employees who collected the data are gone.

20240507

20241125

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT / Tom’s Hardware

20240508

20241125

Cartography of generative AI

20230101

20241125

Navigating the World of Large Language Models

20230529

20241127

How Chain-of-Thought Reasoning Helps Neural Networks Compute / Quanta Magazine

20240321

20241127

The linguistics search engine that overturned the federal mask mandate

20220607

20241127

We Need to Decarbonize Software

20240323

20241127

How Quickly Do Large Language Models Learn Unexpected Skills? / Quanta Magazine

20240213

20241127

The Lost Worlds of Telnet

20190310

20241127

A New Age of Enlightenment? A New Threat to Humanity?: The Impact of Artificial Intelligence by 2040 - Imagining the Digital Future Center

20240219

20241127

‘Collective AI’ expected to resemble Star Trek’s Borg — only nicer (hopefully)

20240327

20241127

Age Verification Laws Drag Us Back to the Dark Ages of the Internet

20240325

20241127

AI Narratives: On Screen! (Part 1)

20240402

20241127

Bernard Stiegler’s philosophy on how technology shapes our world / Aeon Essays

20240401

20241127

Understanding and managing the impact of Machine Learning models on the Web

20240820

20241127

Google Books Is Indexing AI-Generated Garbage

20240404

20241127

A Student’s Guide to Not Writing with ChatGPT

20241114

20241128

Someone Made a Dataset of One Million Bluesky Posts for ‘Machine Learning Research’

20241126

20241128

Looking for the Answer to the Question, “Do I Really Own the Digital Media I Paid For?”

20241126

20241128

A Revolution in How Robots Learn

20241111

20241128

How the Internet Archive’s “Free Digital Library” fell to the “fair use” test

20241119

20241128

Hackers, Wizards of the Electronic Age : Fabrice Florin : Free Download, Borrow, and Streaming : Internet Archive

20180819

20241128

N/A

Yes, That Viral LinkedIn Post You Read Was Probably AI-Generated

20241126

20241128

Five ways you might already encounter AI in cities (and not realise it)

N/A

20241128

About Ethnographic Data Visualization – The Side Unseen

20240101

20241128

OkCupid Study Reveals the Perils of Big-Data Science

20160514

20241128

Japanese scientists were pioneers of AI, yet they’re being written out of its history

N/A

20241129

Reddit overtakes X in popularity of social media platforms in UK

20241128

20241129

Conversational Game Theory – Collective Intelligence Engine for Ai and Humans

20240402

20241129

AI can now create a replica of your personality

20241120

20241129

The trouble with openness

20241127

20241129

Details matter with open source models

20241121

20241129

Smarter than GPT-4: Claude 3 AI catches researchers testing it

20240305

20241129

Sask. appeal court reserves decision on whether thumbs-up emoji can lock in $82K contract / CBC News

20240307

20241129

VR headsets can be hacked with an Inception-style attack

20240311

20241129

Advanced LLM AI models vs A Simple Question

20240313

20241129

What I learned from looking at 900 most popular open source AI tools

20240314

20241129

How bad are search results? Let’s compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT

20200113

20241129

Evaluating Human Factors Beyond Lines of Code

20241121

20241129

The young people sifting through the internet’s worst horrors

20240111

20241130

Machine forgetting: How difficult it is to get AI to forget

N/A

20241130

Anthropic researchers find that AI models can be trained to deceive

20240113

20241130

Understanding ourselves through AI: a new frontier in personality assessment

20240114

20241130

Git branches as a social construct

20240114

20241130

Google Search Really Has Gotten Worse, Researchers Find

20240116

20241130

why lowercase letters save data

20231125

20241130

How Much of the World Is It Possible to Model?

20240115

20241130

Online Communication

20240120

20241130

Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use

N/A

20241130

Learn by Doing: How LLMs Should Reshape Education

20240122

20241130

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

20240111

20241130

What Home Videotaping Can Tell Us About Generative AI

20240124

20241130

Social Media, AI, and the Battle for Your Brain

20231221

20241130

Why Is the Web So Monotonous? Google.

20220804

20241130

Markov Chains Are The Original Language Models

20231011

20241130

Beyond Self-Attention: How a Small Language Model Predicts the Next Token

20240201

20241130

First-Gen Social Media Users Have Nowhere to Go

20231106

20241130

Could AI Disrupt Peer Review?

20240206

20241130

Art in the age of ones and zeros: Datamoshing

20170302

20241130

A search engine in 80 lines of Python

20240205

20241130

Homesteading the Noosphere

20020802

20241130

ChatGPT knows things that Google doesn’t

20240125

20241130

The Internet Is Being Ruined by Bloated Junk

20240115

20241130

Thinking about High-Quality Human Data

20240205

20241130

Video Games Are Mourning the Old, Weird, Clunky Internet

20240205

20241130

When Words Cannot Describe: Designing For AI Beyond Conversational Interfaces

20240202

20241130

AI Reveals Hotspots of Climate Denial

20240214

20241130

How a ragtag band of internet friends became the best at forecasting world events

20240213

20241130

Phallocentricity in GPT-J’s bizarre stratified ontology

20240217

20241130

The rise and fall of robots.txt

20240214

20241130

Subprime Intelligence

20240219

20241130

New report: 60% of OpenAI model’s responses contain plagiarism

N/A

20241130

Data will not tell you what to do

20240221

20241130

Can a programming language implement time travel?

20240212

20241130

Vending machine error reveals secret face image database of college students

20240223

20241130

Resurrecting loved ones as AI ‘ghosts’ could harm your mental health

20240226

20241130

Weapons of Mass Hate Dissemination: The Use of Artificial Intelligence by Right-Wing Extremists - GNET

20240223

20241130

N/A

The internet turned into a crowded mall. Now you need a corner shop. / Pith & Pip

20240628

20241130

N/A

Practico-inertia

20240301

20241130

Millions of research papers at risk of disappearing from the Internet

20240304

20241130

Author Cory Doctorow has a theory about why all tech and social platforms eventually decline

20240303

20241130

The Ideal Social Network

20231023

20241130

The women who coined the expression ‘Surfing the Internet’

20240603

20241130

Screen time robs average toddler of hearing 1,000 words spoken by adult a day, study finds

20240304

20241130

A Bug in Early Creative Commons Licenses Has Enabled a New Breed of Superpredator

N/A

20241130

AI Prompt Engineering Is Dead

20240306

20241130

Atlas of internet surveillance maps ownership of network infrastructures worldwide

20240305

20241130

Open-source champion Kelsey Hightower on the promise of Bluesky

20241126

20241201

Why We See Digital Ads After Talking About Something / McNutt & Partners

20210125

20241201

‘His Facebook was a shrine to my face’: the day I caught my catfish

20241130

20241201

Thoughts on the software industry

20220803

20241202

Open Source AI Definition Erodes the Meaning of “Open Source”

20241031

20241202

The Guy Behind the Most Nostalgic Sites on the Internet

20241129

20241202

Modelling Historical Information with Structured Assertion Records

20241129

20241202

The Myth of Objective Data

20230417

20241202

An Open Source Python Library for Anonymizing Sensitive Data - Scientific Data

20241126

20241202

Can Google Scholar survive the AI revolution?

20241119

20241203

You Have One Voice / Hazel Weakly

20240101

20241203

‘Brain rot’ named Oxford Word of the Year 2024 - Oxford University Press

20241202

20241203

How AI Log Analysis Is Shaping Observability’s Future

20241122

20241203

Can a Comma Solve a Crime?

20241121

20241203

How ChatGPT Search (Mis)represents Publisher Content

N/A

20241203

The Evolution of Machine Translation: A Brief History and What’s Coming Next

N/A

20241203

N/A

Privacy Disasters: FaceHuggers Are Eating Your Skeets

20241202

20241203

N/A

What is Software Anyways? Where Does it Exist?

20240101

20241204

Why an Octopus-like Creature Has Come to Symbolize the State of A.I.

20230530

20241204

New datasets will train AI models to think like scientists

20241202

20241204

Social media algorithms can change your views in just a single day

20241128

20241204

Combining linguistics, archaeology and ancient DNA genetics to understand deep human history

N/A

20241204

Your Bluesky Posts Are Probably In A Bunch of Datasets Now

20241203

20241204

Opinion: Students’ tech skills should be nurtured, not punished

20241130

20241204

The Beginning of the End of Big Tech

20241126

20241205

Teaching Critical Reasoning with AI: Humiliation Rituals

20241204

20241205

Something’s Rotten with the State of Our Archives.

20241110

20241205

Social Media is Disproportionately Hurting Girls

20241204

20241206

She Joined Facebook to Fight Terror. Now She’s Convinced We Need to Fight Facebook.

20241204

20241207

Ethical Web Scraping: Legal Insights and Best Practices

20241206

20241207

On the foolishness of “natural language programming”. (EWD 667)

20101119

20241207

Your Internet Shouldn’t Be My Internet

20241129

20241207

Information is useful if it’s re-usable

20241208

20241209

Report: Tokyo University Used “Tiananmen Square” Keyword to Block Chinese Admissions - Unseen Japan

20241207

20241209

Feeling Outraged? Think Twice Before Hitting “Share.”

20241201

20241210

Researchers Use AI To Turn Sound Recordings Into Accurate Street Images

20241127

20241210

How humans write programs

20180116

20241210

How WhatsApp ate the world

20241209

20241211

Sorry, Social Media Is Never Getting Any Better

20241209

20241211

N/A

15 Times to use AI, and 5 Not to

20241209

20241211

The Hottest New Coding Language is … English

20241210

20241211

Open source projects drown in bad bug reports penned by AI

20241210

20241212

Researchers reduce bias in AI models while preserving or improving accuracy

20241211

20241212

N/A

The Need to Make Content Moderation Transparent / TechPolicy.Press

20241211

20241212

N/A

Stop using generative AI as a search engine

20241205

20241212

The Paradox of the Internet

20240411

20241214

Integration of Music and Art in a Science and Engineering-Based University

20241210

20241214

Do I Really Own the Digital Media I Bought?

20180917

20241215

Cascading collapse of online social networks - Scientific Reports

20171201

20241215

N/A

How Silicon Valley is disrupting democracy

20241213

20241215

Computing inside an AI / Will Whitney

20241212

20241216

We’ve Been Here Before

20241214

20241217

PLATO: How an educational computer system from the ’60s shaped the future

20230317

20241218

Where did mainstream media come from? - The History of the Web

20241210

20241218

AI and Internet Hygiene

20241001

20241219

Spain introduces bill to combat online fake news

20241217

20241219

EU opens investigation into TikTok over election interference

N/A

20241219

Anti-hype LLM reading list

20230820

20241222

Encoding Differentials: Why Charset Matters

20240715

20241223

‎Gemini - So you may be breaking copyright law.

20240529

20241226

N/A

How AI deepfakes polluted elections in 2024

20241221

20241228

Linguistics - A test to measure AI intelligence

20241227

20241228

What Statistics Can and Can’t Tell Us About Ourselves

20190828

20241229

The old-new epistemology of digital journalism: how algorithms and filter bubbles are (re)creating modern metanarratives - Humanities and Social Sciences Communications

20230710

20241229

More Than Half of All Google Search Takedowns Now Come from Link-Busters * TorrentFreak

20241230

20250102

Why Khanmigo (and Other Learning Chatbots) Will Fail - BetterSchooling

20241213

20250102

Kids can’t use computers… and this is why it should worry you

20130729

20250105

Lockdown. The coming war on general-purpose computing

N/A

20250108

Don’t use cosine similarity carelessly

20250114

20250116

OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why / TechCrunch

20250114

20250116