Reading list#

Disclaimer

Some resources in this list may:

Reflect biases or use offensive language
Share views that are unscientific or controversial

They are included not as endorsements but as historical snapshots of how certain topics have been discussed in online communities. If you have questions or need context, feel free to reach out via email (see FAQ) or on Bluesky.

Following is an opinionated list of resources interconnected to digital technicalities (see homepage), ranging from computational linguistics to AI, from philosophy of science to retrogaming, from digital humanities to cybersecurity, and more. Updated several times a week, it contains both the original link and a preservation link (served by the Wayback Machine). Resources can be sorted by publication date (retrieved through Wallabag, or htmldate if the former fails; set to N/A if both scrapers fails to extract it from the HTML source), or the date the resource was added to the list. Some materials may be behind a paywall (and the archived copy may consequently be unavailable); in such cases you may try using the browser extension Bypass Paywalls Clean available for both Firefox and Chrome.

The list may be downloaded in .tsv format from here.

Last update: 10/01/2026

Link	Published	Added	Archived
Misinformation: A Flawed Concept	20241028	20241118
When Machine Learning Tells the Wrong Story	20241109	20241118
Liberation technology: dreams, politics, history	N/A	20241118
Punctuation is dead because the iPhone keyboard killed it	20241110	20241118
Network Of Time	20190101	20241118
Visualizing 13 million BlueSky users	20241112	20241118
How a stubborn computer scientist accidentally launched the deep learning boom	20241111	20241118
Meta Horizon Worlds Has Been Taken Over by Children	20241112	20241118
AI Chatbot Added to Mushroom Foraging Facebook Group Immediately Gives Tips for Cooking Dangerous Mushroom	20241112	20241118
Graph-based AI model maps the future of innovation	20241112	20241118
After Trump’s Victory, the 4B Movement Is Spreading Across TikTok	20241107	20241118
The Open Source Project DeFlock Is Mapping License Plate Surveillance Cameras All Over the World	20241111	20241118
Guardian will no longer post on Elon Musk’s X from its official accounts	20241113	20241118
The AI lab waging a guerrilla war over exploitative AI	20241113	20241118
Our brains are vector databases — here’s why that’s helpful when using AI	20241116	20241118
ChatGPT is Slipping	20241117	20241118	N/A
Super Weights in LLMs - How Pruning Them Destroys a LLM’s Ability to Generate Text ?	20241118	20241118
Drop #9. UDO: The Weird Magic of Digital Folkore	20240119	20241118
AI and Ways of Seeing: Q&A with Lauren Tilton	20241112	20241118
AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably - Scientific Reports	20241114	20241118	N/A
OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI	None	20241118	N/A
Something weird is happening with LLMs and chess	20241114	20241118
The metaphors of artificial intelligence	N/A	20241118
When ads shock: subtle ways that disgust can shape our buying habits	N/A	20241118
Academic papers retracted due to … software licensing?	20241114	20241118
The ambiguous “use” / GioCities	20241115	20241118
How AI Could Break the Career Ladder	None	20241118
The New Hatred of Technology	20241115	20241118
Despite its impressive output, generative AI doesn’t have a coherent understanding of the world, researchers suggest	20241105	20241118
Misinformation really does spread like a virus, suggest mathematical models drawn from epidemiology	N/A	20241118
Sustainable Web Interest Group is Formed	20241104	20241118
Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan) (Ep. 79)	20190923	20241118
Exploring Internet traffic shifts and cyber attacks during the 2024 US election	20241106	20241118
how a neuron learns	20240202	20241118
The images of Spain’s floods weren’t created by AI. The trouble is, people think they were	20241109	20241118
OpenAI’s new “Orion” model reportedly shows small gains over GPT-4	20241110	20241118
Teens learn a new conspiracy theory every week on social media, yet most schools aren’t teaching media literacy	N/A	20241118
IMG_0416	20241103	20241118
Anthropic hires its first “AI welfare” researcher	20241111	20241118
Apple AI notification summaries exist; rarely useful, often hilarious	20241112	20241118
Patterns in Information - Lachlan Gray	None	20241118
The Beginner’s Guide to Visual Prompt Injections: Invisibility Cloaks, Cannibalistic Adverts, and Robot Women / Lakera – Protecting AI teams that disrupt the world.	20241113	20241118
Releasing the largest multilingual open pretraining dataset	20241113	20241118
AI has a stupid secret: we’re still not sure how to test for human levels of intelligence	20241004	20241118
The Commoditization of LLMs – Communications of the ACM	20240912	20241118
AI search could break the web	20241031	20241118
The Internet Archive is even more essential than I realized	N/A	20241118
The Fairness of Fact-checking and Its Impact on Social Media / TechPolicy.Press	20241104	20241118
Generative AI Has a Massive E-Waste Problem	20241104	20241118
The Third-Party Script Breach That Shook The World	20201016	20241118
Seeing Like a Programmer (LambdaConf 2024) — Sympolymathesy, by Chris Krycho	20240507	20241118
AI overwhelmingly prefers white and male job candidates in new test of resume-screening bias	N/A	20241118
Despite its impressive output, generative AI doesn’t have a coherent understanding of the world	20241105	20241118
Why the deep learning boom caught almost everyone by surprise	20241105	20241118
A global dataset of 7 billion individuals with socio-economic characteristics - Scientific Data	20241007	20241118
No, The Web Is Not Dead	20240523	20241118
AI is consolidating corporate power in higher ed (opinion)	20241106	20241118
Google admits massive document leak related to search algorithm is authentic	20240530	20241118
For fame or a death wish? Kids’ TikTok challenge injuries stump psychiatrists	20241106	20241118
Project Analyzing Human Language Usage Shuts Down Because ‘Generative AI Has Polluted the Data’	20240919	20241118
NaNoWriMo Says Condemning AI Is ‘Classist and Ableist’	20240902	20241118
You Can Now See the Code That Helped End Apartheid	20241018	20241118
It’s time to retire the term “user”	20240419	20241118
Everything we know about ‘shadowbans’ on social media	N/A	20241118
Planes, trains, and smartphones	20241015	20241118
Are LLMs Any Good at Ranking People? – Wilsons Blog	20241018	20241118
AI could help people find common ground during deliberations	20241017	20241118
AI art: The end of creativity or the start of a new movement?	20241021	20241118
How AI is generating a ‘sea of sameness’ in job applications	20240908	20241118
You Should Probably Pay Attention to Tokenizers	20241021	20241118
Chatbot that caused teen’s suicide is now more dangerous for kids, lawsuit says	20241023	20241118
Former OpenAI Researcher Says Company Broke Copyright Law	20241023	20241118
Feds Say You Don’t Have a Right to Check Out Retro Video Games Like Library Books	20241025	20241118
Thoughts on the New Digital Feudalism	20241026	20241118
Open Source on its own is no alternative to Big Tech - Bert Hubert’s writings	20241026	20241118	N/A
Inside the U.S. Government-Bought Tool That Can Track Phones at Abortion Clinics	20241023	20241118
Instagram saves the best video quality for the most popular content	20241027	20241118
The Open Source AI Definition – 1.0	N/A	20241118
OpenAI says ChatGPT treats us all the same (most of the time)	20241015	20241118	N/A
Software freedom isn’t about licenses – it’s about power.	20210328	20241118
LinkedIn launches its first AI agent to take on the role of job recruiters / TechCrunch	20241029	20241118
Make It Ephemeral: Software Should Decay and Lose Data	20241030	20241118
Generative AI as an Icebreaker to Help Us Accept Other Ways of Thinking – Communications of the ACM	20241030	20241118
Embeddings are underrated	20241021	20241118
wrestling the web from corporate control requires making it boring again	20000101	20241118
How Everyone Got Lost in Netflix’s Endless Library	N/A	20241118
When Data Is Missing, Scientists Guess. Then Guess Again. / Quanta Magazine	20241002	20241118
Beyond the link tax: journalism and the changing nature of the internet - Halifax Examiner	20240917	20241118
Is big tech harming society? To find out, we need research – but it’s being manipulated by big tech itself	N/A	20241118
Man learns he’s being dumped via “dystopian” AI summary of texts	20241010	20241118
Open-source AI definition finally gets its first release candidate - and a compromise	20241009	20241118
Are humans the only ones that can be creative?	20241010	20241118
Cyber resilience act: Council adopts new law on security requirements for digital products	N/A	20241118	N/A
Amazon Dreams of AI Agents That Do the Shopping for You	20241009	20241118
FEMA adds misinformation to its list of disasters to clean up	20241008	20241118
TikTok executives know about app’s effect on teens, lawsuit documents allege	20241011	20241118
Maithra Raghu / The best AIs will be constructed not emergent	20240925	20241118
This AI Pioneer Thinks AI Is Dumber Than a Cat	N/A	20241118
The Editors Protecting Wikipedia from AI Hoaxes	20241009	20241118
Lessons from Plain Text / rugu	20241013	20241118
Stop aggregating away the signal in your data	20220303	20241118
People are using Google study software to make AI podcasts—and they’re weird and amazing	20241003	20241118
Tech Innovations to make the Tibetan Language a First-class Citizen in the Digital World - Buddhist Digital Resource Center	20241010	20241118	N/A
Reasoning failures highlighted by Apple research on LLMs	20241012	20241118
AI is the new plastic	20241002	20241118
What is Code?	None	20241118	N/A
AI-Powered Social Media Manipulation App Promises to ‘Shape Reality’	20241016	20241118
AI Avatars Are Doing Job Interviews Now	20240927	20241118
Google Serving AI-Generated Images of Mushrooms Could Have ‘Devastating Consequences’	20240924	20241118
RAG is not just text	20240928	20241118
An A.I. Model Helped Uncover 303 Previously Unseen Nazca Lines in Peru	20240927	20241118
If your AI seems smarter, it’s thanks to smarter human trainers	N/A	20241118
AI and globalisation are shaking up software developers’ world	N/A	20241118
New study reveals positive mood changes during video game play	20240925	20241118
How ‘Embeddings’ Encode What Words Mean — Sort Of / Quanta Magazine	20240918	20241118
Vanishing Culture: Preserving Cookbooks / Internet Archive Blogs	20240930	20241118
A.I. Pioneers Call for Protections Against ‘Catastrophic Risks’	20240916	20241118
The Modern CLI Renaissance	20240904	20241118
The 1970s librarians who revolutionised the challenge of search / Aeon Essays	20230605	20241118
Delving into “delve”	20240331	20241118
On Opting Out of Copyright	20240429	20241118
We’re losing our digital history. Can the Internet Archive save it?	20240916	20241118
Copyright Keepers Just Destroyed a Huge Digital Library	20240920	20241118
Technical writing is too important to leave to language models	20240709	20241118
Cold war spy satellites and AI detect ancient underground aqueducts	20240916	20241118
The Age of Software Artisans	None	20241118
Algorithms for the 21st Century	20060101	20241118
“Dead Internet theory” comes to life with new AI-powered social media app	20240918	20241118
The continuing tragedy of emoji on the web	20240917	20241118
Chatbots in science: What can ChatGPT do for you?	20240814	20241118	N/A
AI tool that can do ‘81 years of detective work in 30 hours’ trialled by police	20240923	20241118
Holy Hell, The Social Web Did Not Begin In 2008 - Bix Dot Blog	N/A	20241118
Please Don’t Ask AI If Something Is Poisonous	20240925	20241118
Google’s NotebookLM can help you dive deeper into YouTube videos	20240926	20241118
Project Overview ‹ AI-Implanted False Memories – MIT Media Lab	20240831	20241118
Greppability is an underrated code metric	20240829	20241118
Copyright Is Not a Tool to Silence Critics of Religious Education	20240828	20241118
The Imperial Origins of Big Data - Yale University Press	20240828	20241118
The /llms.txt file – llms-txt	20240903	20241118
Disappearing web and what to do about it.	20240816	20241118
Turn Your Code Into Pixel Art	20240102	20241118
US, Britain, EU to sign first international AI treaty	N/A	20241118
New AI model “learns” how to simulate Super Mario Bros. from video footage	20240905	20241118
GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation / HKS Misinformation Review	20240903	20241118
Google is losing its status as a verb	20240906	20241118
Bitten by Unicode – pyATL	20240901	20241118
LLMs produce racist output when prompted in African American English	20240828	20241118
Jeremy Couillard’s video games capture what it’s like to be alive right now	20240209	20241118
ON ALGORITHMIC WAGE DISCRIMINATION - Columbia Law Review	20231120	20241118
Inrupt, Tim Berners-Lee’s Solid, and Me	20200824	20241118
Can You Trust Dr. Wikipedia?	20240906	20241118
An AI Bot Named James Has My Old Local News Job	N/A	20241118
Facebook admits to scraping every Australian adult user’s public photos and posts to train AI, with no opt-out option	20240910	20241118
The Network is the Territory	20240908	20241118
Grounding AI in reality with a little help from Data Commons	20241111	20241118
Did ChatGPT just message me… First?	N/A	20241118
The Tao of Unicode Sparklines	20210805	20241118
Move over, text: Video is the new medium of our lives	20240824	20241118
AI and the future of sex	20240826	20241118
Here’s how people are actually using AI	20240812	20241118
We need to prepare for ‘addictive intelligence’	20240805	20241118
A new public database lists all the ways AI could go wrong	20240814	20241118
The race to save our online lives from a digital dark age	20240819	20241118
How gamification took over the world	20240613	20241118
Algorithms are everywhere	20240227	20241118
Wikimedia’s CTO: In the age of AI, human contributors still matter	20240226	20241118
The online art catalogue that chronicles a stolen African heritage	20240104	20241118
Recapturing early-internet whimsy with HTML	20231221	20241118
The grassroots push to digitize India’s most precious documents	20231025	20241118
Stephen Wolfram thinks we need philosophers working on big questions around AI / TechCrunch	20240825	20241118
The Psychology of Immersion in Video Games	20100728	20241118
Research shows more than 80% of AI projects fail, wasting billions of dollars in capital and resources: Report / Tom’s Hardware	20240828	20241118
When It Comes to Artificial Intelligence, ‘Big Data’ Isn’t Everything	20240828	20241118
I spent an evening on a fictitious web	20240828	20241118
How to build a terrible RAG system - jxnl.co	20240107	20241118
Under Meredith Whittaker, Signal Is Out to Prove Surveillance Capitalism Wrong	20240828	20241118
Rearchiving 2 million hours of digital radio, a comprehensive process	20240828	20241118
Rediscovering the Small Web - Neustadt.fr	20200525	20241118
Google Thinks Beethoven Looks Like Mr. Bean	20240830	20241118
A new way to build neural networks could make AI more understandable	20240830	20241118
Why A.I. Isn’t Going to Make Art	N/A	20241118
Chatbots Are Primed to Warp Reality	20240830	20241118
Political posts on X could harm academics’ credibility, new study finds	20240828	20241118
What we can learn from vintage computing	20221213	20241118
Artificial intelligence is losing hype	N/A	20241118
Against nostalgia in computing	None	20241118
No one’s ready for this	20240822	20241118
Was Linguistic A.I. Created by Accident?	N/A	20241118
A Short History of Glitch Art: From Inception to the Present Day	N/A	20241118
Facebook Banned Me for Life Because I Help People Use It Less	20211007	20241118
More than calculators: Why large language models threaten learning, teaching, and education	N/A	20241118
Olivetti Programma 101: at the origins of the Personal Computer / Inexhibit	20170212	20241118
Capt. Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (1982)	20240826	20241118
Inside the long quest to advance Chinese writing technology	20240826	20241118
What happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives — Yi Tay	20240716	20241118
We need new metaphors that put life at the centre of biology / Aeon Essays	20240712	20241118
The Elegance of the ASCII Table	20240721	20241118
Data For The Ages, Take Two	20201024	20241118
Switzerland now requires all government software to be open source	20240729	20241118
The Data That Powers A.I. Is Disappearing Fast	20240719	20241118
Why AI Model Collapse Due to Self-Training Is a Growing Concern	20240724	20241118
Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning	N/A	20241118
Tiktok LLM	20240624	20241118
The bizarre secrets I found investigating corrupt Winamp skins	20240724	20241118
Open File format in data analytics and AI - changing the international rules game	20240727	20241118
Data from deleted GitHub repos may not really be deleted	20240725	20241118
Ethics of Local LLMs: A Response to Zuckerberg’s ‘’Open Source AI Manifesto’’	20240725	20241118
The Backlash Against AI Scraping Is Real and Measurable	20240723	20241118
New study on AI-assisted creativity reveals an interesting social dilemma	20240728	20241118
Reimagining the Semantic Web: UCL’s Innovative Synthesis of AI and Web Science - Browser London	20240729	20241118
How embedding models encode semantic meaning	20240803	20241118
To preserve their work — and drafts of history — journalists take archiving into their own hands	20240731	20241118
Free Software Needs Free Tools :: Benjamin Mako Hill	20100604	20241118
Has the AI bubble burst? Wall Street wonders if artificial intelligen…	20240804	20241118
Debates on the nature of artificial general intelligence / Science	20240701	20241118
Myspace celebrates its 21st birthday. Do we still need it? / TribLIVE…	20240806	20241118	N/A
The Great Open Source Shake-up	20190908	20241118
Google and Meta struck secret ads deal to target teenagers	20240808	20241118
Demo: Predicting social science experimental results using LLMs	N/A	20241118
AI and the techno-utopian path not taken	20240801	20241118
Is It Time To Version Observability? (Signs Point To Yes)	20240807	20241118
How Algorithms Keep Workers Under Their Control	20240805	20241118
Cannibal AIs Could Risk Digital ‘Mad Cow Disease’ Without Fresh Data	20240806	20241118
Excess memes and ‘reply all’ emails are bad for climate, researcher warns	20240809	20241118
Research AI model unexpectedly modified its own code to extend runtime	20240814	20241118
Code as Art	20240817	20241118
Markov chains are funnier than LLMs	20240818	20241118
What If Data Is a Bad Idea?	20240818	20241118
OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole	20240719	20241118
LLMs Know More Than What They Say	20240815	20241118
Whatever Happened to the Semantic Web?	20180527	20241118
On the cruelty of really teaching computing science (EWD 1036)	20090512	20241118
The 𝕆ᗪ⒟𝙞ȶч of Unicode Homoglyphs	N/A	20241119
ChatGPT outperforms undergrads in intro-level courses, falls short later	20240628	20241119
The telltale words that could identify generative AI text	20240701	20241119
Study reveals why AI models that analyze medical images can be biased	20240628	20241119
What I’ve learned about Open Source community over 30 years - OpenSource.net	20240629	20241119
Design as Thought: AI and the Future of Design	20240608	20241119
Google: AI Potentially Breaking Reality Is a Feature Not a Bug	20240703	20241119
Free and Open Source Software–and Other Market Failures – Communications of the ACM	20240703	20241119
Ever put content on the web? Microsoft says that it’s okay for them to steal it because it’s ‘freeware.’	20240628	20241119
How Good Is ChatGPT at Coding, Really?	20240706	20241119
Closing the Gap in Non-Latin-Script Data: A tool for building and navigating collections of DH research projects	20230809	20241119
Scripts, Transliteration, and Computer Access	19970101	20241119
Vision language models are blind	20000101	20241119
Is AI the beginning of the democratization of creativity?	20241119	20241119
We need visual programming. No, not like that.	20240101	20241119
Google Now Defaults to Not Indexing Your Content	20240715	20241119
Exploring the vastness of a website — Elliott’s Computer	20190818	20241119
It May Soon Be Legal to Jailbreak AI to Expose How it Works	20240718	20241119
Want to spot a deepfake? Look for the stars in their eyes	20240717	20241119
What Is ChatGPT Doing … and Why Does It Work?	20230214	20241119
Large language model data pipelines and Common Crawl (WARC/WAT/WET)	20230604	20241119
All the Data on Earth Can Fit in a Cup Full of DNA. This Is MIT’s Jurassic Park-Inspired Project	20240618	20241119
AI’s Brain Drain	20240603	20241119
Toolkits for the Mind	20150402	20241119
Why your brain is 3 milion more times efficient than GPT-4 - dead simple introduction to Embeddings, HNSW, ANNS, Vector Databases and their comparison based on experience from production project	20230722	20241119
AI Is Already Wreaking Havoc on Global Power Systems	None	20241119
A Third-World Critique of the Human Rights-Based Approach to Content Moderation / TechPolicy.Press	20240623	20241119
Surfing the (Human-Made) Internet	20240528	20241119
Human neuroscience is entering a new era — it mustn’t forget its human dimension	20240619	20241119
What the internet looked like in 1994, according to 15 webpages born that year	N/A	20241119
Measuring the Growth of the Web	19950101	20241119
Could AI Achieve General Intelligence, and What Would That Even Mean?	20240625	20241119
Researchers upend AI status quo by eliminating matrix multiplication in LLMs	20240625	20241119
Pokémon Go Players Have Unwittingly Trained AI to Navigate the World	20241119	20241123
Testing AI on language comprehension tasks reveals insensitivity to underlying meaning - Scientific Reports	20241114	20241123
The macOS LC_COLLATE hunt - Zhiming Wang	20200603	20241123
down in the posting mines / poking at ghosts	20241122	20241123
How OpenAI stress-tests its large language models	20241121	20241123
For Teens Online, Conspiracy Theories Are Commonplace. Media Literacy Is Not. - EdSurge News	20241107	20241123
Remembering Cyberia, the World’s First Ever Cyber Cafe	20241121	20241123
Autopoietic Networks	20150201	20241124
The Fantasy of Cozy Tech	20241120	20241124
‘All of a Sudden, Joe Blow Can See the CEO’s Emails’	20241121	20241124
Understanding the EU AI Act’s Impact and Ripple Effects in the US	20241008	20241124
Creating a public counterpoint for AI / The Mozilla Blog	20241002	20241124
Emoji history: the missing years	20240510	20241124
How tech giants cut corners to harvest data for AI	20240406	20241124
Autism & the Internet will defeat the Monoculture	20240512	20241124
Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun	20240410	20241124
The British Library hack is a warning for all academic libraries	20240319	20241124
Transformers Are What You Do Not Need	N/A	20241124
The Website Obesity Crisis	20150720	20241124
Open Source Is at a Crossroads	20240507	20241124
The Small Web and Science	20240514	20241124
Using Simple Tools as a Radical Act of Independence	20241118	20241124
Why neural networks struggle with the Game of Life - TechTalks	20200916	20241124
How to Use GitHub Actions to Automate Data Scraping	N/A	20241124
State of Compute Access: How to Bridge the New Digital Divide	20231207	20241124
Indian Voters Are Being Bombarded With Millions of Deepfakes. Political Candidates Approve	20240520	20241124
When Online Content Disappears	20240517	20241124
You Don’t Own Your Content on the Internet. You Never Have.	20240521	20241124
Do text embeddings perfectly encode text?	20240305	20241124
A Brief Overview of Gender Bias in AI	20240408	20241124
An Introduction to the Problems of AI Consciousness	20230930	20241124
What Do LLMs Know About Linguistics? It Depends on How You Ask	20230709	20241124
Grounding Large Language Models in a Cognitive Foundation: How to Build Someone We Can Talk To	20230415	20241124
Large Language Model: world models or surface statistics?	20230121	20241124
Here’s what’s really going on inside an LLM’s neural network	20240522	20241124
Google Is Paying Reddit $60 Million for Fucksmith to Tell Its Users to Eat Glue	20240523	20241124
Meta is using your Instagram and Facebook photos to train its AI models	20240511	20241124
The Danger Of Superhuman AI Is Not What You Think / NOEMA	20240523	20241124
What Science Forgets	20240523	20241124
An Evolving Sixth Sense for AI	20240525	20241124
Facebook users say ‘amen’ to bizarre AI-generated images of Jesus	20240319	20241124
No, Today’s AI Isn’t Sentient. Here’s How We Know	20240522	20241124
To the brain, reading computer code is not the same as reading language	20201215	20241124
Partial Regurgitation and how LLMs really work	20240523	20241124
Big Data is Dead	20230207	20241124	N/A
What Comes After Open Source	20241024	20241124	N/A
How Many People Are Addicted to Social Media?	N/A	20241124	N/A
The next wave of AI hype will be geopolitical. You’re paying	20240529	20241124
Indexing all of Wikipedia, on a laptop	20240529	20241124
Engineering for Slow Internet – brr	20240530	20241124
Understanding Large Language Models – A Transformative Reading List	20230207	20241124
1-bit LLMs Could Solve AI’s Energy Demands	20240530	20241124
Tiny number of ‘supersharers’ spread the vast majority of fake news	N/A	20241124
FineWeb: decanting the web for the finest text data at scale - a Hugging Face Space by HuggingFaceFW	20240523	20241124
What AI thinks a beautiful woman looks like	20240531	20241124
An Overview of the Textual Data Analysis Workflow	20210401	20241124
domm / Perl / Chopping UTF-8	20240604	20241124
How Online Privacy Is Like Fishing	20240603	20241124
The Backrooms of the Internet Archive	20240601	20241124
After Social Media	20200106	20241124
Inside LLMs: understanding tokens - Generative AI France	20240610	20241124
An Anonymous-Messaging App Upended This High School - WSJ	20240610	20241124	N/A
Researchers Say There’s a Vulgar But More Accurate Term for AI Hallucinations	20240610	20241124
We need a social science of data	20240612	20241124	N/A
Hacker Theory - Journal #146	20131222	20241124
Good code is rarely read	20240606	20241124
Lies, Damned Lies, and Data Science	N/A	20241124
Ghosts in the ROM	N/A	20241124
How we Chunk - turning PDF’s into hierarchical structure for RAG	N/A	20241124
Coding a Neural Network from Scratch for Absolute Beginners	N/A	20241124
Overcoming the limits of current LLM	20240718	20241124
Demystifying cookies and tokens – Tommi Hovi	20240502	20241124
Scrape like a pro… but not like an AI company	20240729	20241124
I investigated millions of tweets from the Kremlin’s ‘troll factory’ and discovered classic propaganda techniques reimagined for the social media age	N/A	20241124
Breaking out of VRChat using a Unity bug	20241123	20241124
LLMs Aren’t Just “Trained On the Internet” Anymore	20240531	20241124
What are embeddings?	None	20241124
The YouTube Algorithm and Manufacturing Consent	20241117	20241124	N/A
Engines of Engagement – A Curious Book About Generative AI	20231018	20241124
The Iterative Paraphrasing Experiment: How GenAI Morphs a Story Over 100 Rewrites	N/A	20241125
Writing around an AI taboo	20240306	20241125
Data centers powering artificial intelligence could use more electricity than entire cities	20241123	20241125
It’s Surprisingly Easy to Jailbreak LLM-Driven Robots	20241111	20241125
The WTF-8 encoding	20220223	20241125
Do Coding Boot Camps Make Sense in an A.I. World?	20241125	20241125
Documenting the Assault on Disinformation and Hate Speech Research / TechPolicy.Press	20241124	20241125
‘Thirsty’ ChatGPT uses four times more water than previously thought	20241004	20241125
A City Is Not a Computer	N/A	20241125
Our Transparent Future	N/A	20241125
Here’s a ‘Brand-New’ Massive Multilingual Dataset for Machine Translation	20240403	20241125
#youtubepick The Enshittification of Internet is Here - Why and How?	20240417	20241125
Programming Is Mostly Thinking	20140929	20241125
Self-Reasoning Tokens, teaching models to think ahead.	20240420	20241125
The Tyranny of Content Algorithms	20240407	20241125
Lost Language of the Machines	20200101	20241125
Torching the Modern-Day Library of Alexandria	20170420	20241125
Exploring Small Language Models	20240424	20241125
The Man Who Killed Google Search	20240423	20241125
Source Code With Emoji	20240424	20241125
What can we learn from ChatGPT jailbreaks?	20240819	20241125
The Person Saving The Media You Love Is You - Aftermath	20240426	20241125
ChatGPT provides false information about people, and OpenAI can’t correct it	20240429	20241125
Mistakes that data science students make	20240428	20241125
You can’t just assume UTF-8	20240429	20241125
Understanding Software – Ceejbot’s notes	20240329	20241125
LLMs Can’t Do Probability - Brainsteam	20240501	20241125
New EU rules needed to address digital addiction / News / European Parliament	20231212	20241125
93% of Paint Splatters are Valid Perl Programs	20230101	20241125	N/A
THE NATURE OF CODE	20210515	20241125	N/A
The User Is On Their Own / selfaware soup	20240501	20241125	N/A
Humans share the web equally with bots, report warns amid fears of ‘dead internet’	20240417	20241125
Machine Unlearning in 2024	20231227	20241125
Decoding UTF8 with Parallel Extract	20171006	20241125
How are Embeddings Affecting Traditional Text Search?	20240506	20241125
Add Bluetooth to the Long List of Border Surveillance Technologies	20240506	20241125
The Antisocial Network: How the 90s Internet Died Like Diaryland	20240729	20241125
40 years later, a game for the ZX Spectrum will be once again broadcast over FM radio - Računalniški muzej	20240508	20241125
OpenAI destroyed a trove of books used to train AI models. The employees who collected the data are gone.	20240507	20241125
Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT / Tom’s Hardware	20240508	20241125
Cartography of generative AI	20230101	20241125
Navigating the World of Large Language Models	20230529	20241127
How Chain-of-Thought Reasoning Helps Neural Networks Compute / Quanta Magazine	20240321	20241127
The linguistics search engine that overturned the federal mask mandate	20220607	20241127
We Need to Decarbonize Software	20240323	20241127
How Quickly Do Large Language Models Learn Unexpected Skills? / Quanta Magazine	20240213	20241127
The Lost Worlds of Telnet	20190310	20241127
A New Age of Enlightenment? A New Threat to Humanity?: The Impact of Artificial Intelligence by 2040 - Imagining the Digital Future Center	20240219	20241127
‘Collective AI’ expected to resemble Star Trek’s Borg — only nicer (hopefully)	20240327	20241127
Age Verification Laws Drag Us Back to the Dark Ages of the Internet	20240325	20241127
AI Narratives: On Screen! (Part 1)	20240402	20241127
Bernard Stiegler’s philosophy on how technology shapes our world / Aeon Essays	20240401	20241127
Understanding and managing the impact of Machine Learning models on the Web	20240820	20241127
Google Books Is Indexing AI-Generated Garbage	20240404	20241127
A Student’s Guide to Not Writing with ChatGPT	20241114	20241128
Someone Made a Dataset of One Million Bluesky Posts for ‘Machine Learning Research’	20241126	20241128
Looking for the Answer to the Question, “Do I Really Own the Digital Media I Paid For?”	20241126	20241128
A Revolution in How Robots Learn	20241111	20241128
How the Internet Archive’s “Free Digital Library” fell to the “fair use” test	20241119	20241128
Hackers, Wizards of the Electronic Age : Fabrice Florin : Free Download, Borrow, and Streaming : Internet Archive	20180819	20241128	N/A
Yes, That Viral LinkedIn Post You Read Was Probably AI-Generated	20241126	20241128
Five ways you might already encounter AI in cities (and not realise it)	N/A	20241128
About Ethnographic Data Visualization – The Side Unseen	20240101	20241128
OkCupid Study Reveals the Perils of Big-Data Science	20160514	20241128
Japanese scientists were pioneers of AI, yet they’re being written out of its history	N/A	20241129
Reddit overtakes X in popularity of social media platforms in UK	20241128	20241129
Conversational Game Theory – Collective Intelligence Engine for Ai and Humans	20240402	20241129
AI can now create a replica of your personality	20241120	20241129
The trouble with openness	20241127	20241129
Details matter with open source models	20241121	20241129
Smarter than GPT-4: Claude 3 AI catches researchers testing it	20240305	20241129
Sask. appeal court reserves decision on whether thumbs-up emoji can lock in $82K contract / CBC News	20240307	20241129
VR headsets can be hacked with an Inception-style attack	20240311	20241129
Advanced LLM AI models vs A Simple Question	20240313	20241129
What I learned from looking at 900 most popular open source AI tools	20240314	20241129
How bad are search results? Let’s compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT	20200113	20241129
Evaluating Human Factors Beyond Lines of Code	20241121	20241129
The young people sifting through the internet’s worst horrors	20240111	20241130
Machine forgetting: How difficult it is to get AI to forget	N/A	20241130
Anthropic researchers find that AI models can be trained to deceive	20240113	20241130
Understanding ourselves through AI: a new frontier in personality assessment	20240114	20241130
Git branches as a social construct	20240114	20241130
Google Search Really Has Gotten Worse, Researchers Find	20240116	20241130
why lowercase letters save data	20231125	20241130
How Much of the World Is It Possible to Model?	20240115	20241130
Online Communication	20240120	20241130
Nightshade, the free tool that ‘poisons’ AI models, is now available for artists to use	N/A	20241130
Learn by Doing: How LLMs Should Reshape Education	20240122	20241130
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism	20240111	20241130
What Home Videotaping Can Tell Us About Generative AI	20240124	20241130
Social Media, AI, and the Battle for Your Brain	20231221	20241130
Why Is the Web So Monotonous? Google.	20220804	20241130
Markov Chains Are The Original Language Models	20231011	20241130
Beyond Self-Attention: How a Small Language Model Predicts the Next Token	20240201	20241130
First-Gen Social Media Users Have Nowhere to Go	20231106	20241130
Could AI Disrupt Peer Review?	20240206	20241130
Art in the age of ones and zeros: Datamoshing	20170302	20241130
A search engine in 80 lines of Python	20240205	20241130
Homesteading the Noosphere	20020802	20241130
ChatGPT knows things that Google doesn’t	20240125	20241130
The Internet Is Being Ruined by Bloated Junk	20240115	20241130
Thinking about High-Quality Human Data	20240205	20241130
Video Games Are Mourning the Old, Weird, Clunky Internet	20240205	20241130
When Words Cannot Describe: Designing For AI Beyond Conversational Interfaces	20240202	20241130
AI Reveals Hotspots of Climate Denial	20240214	20241130
How a ragtag band of internet friends became the best at forecasting world events	20240213	20241130
Phallocentricity in GPT-J’s bizarre stratified ontology	20240217	20241130
The rise and fall of robots.txt	20240214	20241130
Subprime Intelligence	20240219	20241130
New report: 60% of OpenAI model’s responses contain plagiarism	N/A	20241130
Data will not tell you what to do	20240221	20241130
Can a programming language implement time travel?	20240212	20241130
Vending machine error reveals secret face image database of college students	20240223	20241130
Resurrecting loved ones as AI ‘ghosts’ could harm your mental health	20240226	20241130
Weapons of Mass Hate Dissemination: The Use of Artificial Intelligence by Right-Wing Extremists - GNET	20240223	20241130	N/A
The internet turned into a crowded mall. Now you need a corner shop. / Pith & Pip	20240628	20241130	N/A
Practico-inertia	20240301	20241130
Millions of research papers at risk of disappearing from the Internet	20240304	20241130
Author Cory Doctorow has a theory about why all tech and social platforms eventually decline	20240303	20241130
The Ideal Social Network	20231023	20241130
The women who coined the expression ‘Surfing the Internet’	20240603	20241130
Screen time robs average toddler of hearing 1,000 words spoken by adult a day, study finds	20240304	20241130
A Bug in Early Creative Commons Licenses Has Enabled a New Breed of Superpredator	N/A	20241130
AI Prompt Engineering Is Dead	20240306	20241130
Atlas of internet surveillance maps ownership of network infrastructures worldwide	20240305	20241130
Open-source champion Kelsey Hightower on the promise of Bluesky	20241126	20241201
Why We See Digital Ads After Talking About Something / McNutt & Partners	20210125	20241201
‘His Facebook was a shrine to my face’: the day I caught my catfish	20241130	20241201
Thoughts on the software industry	20220803	20241202
Open Source AI Definition Erodes the Meaning of “Open Source”	20241031	20241202
The Guy Behind the Most Nostalgic Sites on the Internet	20241129	20241202
Modelling Historical Information with Structured Assertion Records	20241129	20241202
The Myth of Objective Data	20230417	20241202
An Open Source Python Library for Anonymizing Sensitive Data - Scientific Data	20241126	20241202
Can Google Scholar survive the AI revolution?	20241119	20241203
You Have One Voice / Hazel Weakly	20240101	20241203
‘Brain rot’ named Oxford Word of the Year 2024 - Oxford University Press	20241202	20241203
How AI Log Analysis Is Shaping Observability’s Future	20241122	20241203
Can a Comma Solve a Crime?	20241121	20241203
How ChatGPT Search (Mis)represents Publisher Content	N/A	20241203
The Evolution of Machine Translation: A Brief History and What’s Coming Next	N/A	20241203	N/A
Privacy Disasters: FaceHuggers Are Eating Your Skeets	20241202	20241203	N/A
What is Software Anyways? Where Does it Exist?	20240101	20241204
Why an Octopus-like Creature Has Come to Symbolize the State of A.I.	20230530	20241204
New datasets will train AI models to think like scientists	20241202	20241204
Social media algorithms can change your views in just a single day	20241128	20241204
Combining linguistics, archaeology and ancient DNA genetics to understand deep human history	N/A	20241204
Your Bluesky Posts Are Probably In A Bunch of Datasets Now	20241203	20241204
Opinion: Students’ tech skills should be nurtured, not punished	20241130	20241204
The Beginning of the End of Big Tech	20241126	20241205
Teaching Critical Reasoning with AI: Humiliation Rituals	20241204	20241205
Something’s Rotten with the State of Our Archives.	20241110	20241205
Social Media is Disproportionately Hurting Girls	20241204	20241206
She Joined Facebook to Fight Terror. Now She’s Convinced We Need to Fight Facebook.	20241204	20241207
Ethical Web Scraping: Legal Insights and Best Practices	20241206	20241207
On the foolishness of “natural language programming”. (EWD 667)	20101119	20241207
Your Internet Shouldn’t Be My Internet	20241129	20241207
Information is useful if it’s re-usable	20241208	20241209
Report: Tokyo University Used “Tiananmen Square” Keyword to Block Chinese Admissions - Unseen Japan	20241207	20241209
Feeling Outraged? Think Twice Before Hitting “Share.”	20241201	20241210
Researchers Use AI To Turn Sound Recordings Into Accurate Street Images	20241127	20241210
How humans write programs	20180116	20241210
How WhatsApp ate the world	20241209	20241211
Sorry, Social Media Is Never Getting Any Better	20241209	20241211	N/A
15 Times to use AI, and 5 Not to	20241209	20241211
The Hottest New Coding Language is … English	20241210	20241211
Open source projects drown in bad bug reports penned by AI	20241210	20241212
Researchers reduce bias in AI models while preserving or improving accuracy	20241211	20241212	N/A
The Need to Make Content Moderation Transparent / TechPolicy.Press	20241211	20241212	N/A
Stop using generative AI as a search engine	20241205	20241212
The Paradox of the Internet	20240411	20241214
Integration of Music and Art in a Science and Engineering-Based University	20241210	20241214
Do I Really Own the Digital Media I Bought?	20180917	20241215
Cascading collapse of online social networks - Scientific Reports	20171201	20241215	N/A
How Silicon Valley is disrupting democracy	20241213	20241215
Computing inside an AI / Will Whitney	20241212	20241216
We’ve Been Here Before	20241214	20241217
PLATO: How an educational computer system from the ’60s shaped the future	20230317	20241218
Where did mainstream media come from? - The History of the Web	20241210	20241218
AI and Internet Hygiene	20241001	20241219
Spain introduces bill to combat online fake news	20241217	20241219
EU opens investigation into TikTok over election interference	N/A	20241219
Anti-hype LLM reading list	20230820	20241222
Encoding Differentials: Why Charset Matters	20240715	20241223
‎Gemini - So you may be breaking copyright law.	20240529	20241226	N/A
How AI deepfakes polluted elections in 2024	20241221	20241228
Linguistics - A test to measure AI intelligence	20241227	20241228
What Statistics Can and Can’t Tell Us About Ourselves	20190828	20241229
The old-new epistemology of digital journalism: how algorithms and filter bubbles are (re)creating modern metanarratives - Humanities and Social Sciences Communications	20230710	20241229
More Than Half of All Google Search Takedowns Now Come from Link-Busters * TorrentFreak	20241230	20250102
Why Khanmigo (and Other Learning Chatbots) Will Fail - BetterSchooling	20241213	20250102
Kids can’t use computers… and this is why it should worry you	20130729	20250105
Lockdown. The coming war on general-purpose computing	N/A	20250108
Don’t use cosine similarity carelessly	20250114	20250116
OpenAI’s AI reasoning model ‘thinks’ in Chinese sometimes and no one really knows why / TechCrunch	20250114	20250116
OpenAI used this subreddit to test AI persuasion / TechCrunch	20250131	20250202
Putting DeepSeek to the test: how its performance compares against other AI tools	N/A	20250205
Why AI Is A Philosophical Rupture / NOEMA	20250204	20250210
Your AI can’t see gorillas – Chiraag Gohel	20250205	20250210
Microsoft Study Finds AI Makes Human Cognition “Atrophied and Unprepared”	20250210	20250212
New hack uses prompt injection to corrupt Gemini’s long-term memory	20250211	20250214
Turkey’s translators are training the AI tools that will replace them	20250220	20250222
‘Indiana Jones’ jailbreak approach highlights the vulnerabilities of existing LLMs	20250220	20250225
The difference between tokens and words	20250307	20250313
ChatGPT tokens and Unicode	20250308	20250313
Why extracting data from PDFs is still a nightmare for data experts	20250311	20250313
Anthropic can now track the bizarre inner workings of a large language model	20250327	20250330
Why do LLMs make stuff up? New research peers under the hood.	20250328	20250330
Build your own tools (even if you reinvent the wheel)	20250511	20250901
AI crawlers destroying websites in hunger for content	20250829	20250901
Every question you ask, every comment you make, I’ll be recording you	20250818	20250901
Google is killing the open web	20250817	20250901
I Tested How Well AI Tools Work for Journalism	N/A	20250901
AI Cannibalism Can Be Good, by Gwern · Gwern.net	20250427	20250901
Everything Is Correlated · Gwern.net	20140912	20250901
LLMs Are Biased! Here’s Why Enterprises Can’t Afford to Just Plug and Pray	20250823	20250901
YouTube’s Sneaky AI ‘Experiment’	20250822	20250901
Culture Has No Name for This Cursed Vibe. It’s Everywhere	N/A	20250901
The Default Trap: Why Anthropic’s Data Policy Change Matters	20250830	20250901
New AI attack shows how images hide secret commands, letting hackers siphon private data directly from unsuspecting chatbot users	20250831	20250901
Lossy encyclopedia	20250827	20250903
What are embeddings?	None	20250903
The Last Days Of Social Media / NOEMA	20250902	20250904
We traded blogs for black boxes, now we’re paying for it	20250909	20250913
Why do LLMs freak out over the seahorse emoji?	20251004	20251007
The Essence of Prompt Engineering is the Art of Asking Questions	20251025	20251026
How AI and Wikipedia have sent vulnerable languages into a doom spiral	20250925	20251027
Largest study of its kind shows AI assistants misrepresent news content 45% of the time – regardless of language or territory	20251022	20251027
A small number of samples can poison LLMs of any size	20251009	20251104
OII / Study identifies weaknesses in how AI systems are evaluated	20160706	20251110
Evidence That Humans Now Speak in a Chatbot-Influenced Dialect Is Getting Stronger	20251207	20251212
Prompts are (not) the new source code - Quesma Blog	20260109	20260110