Publications by Stephen Turner
Use an LLM to translate help documentation on-the-fly
Reposted from Paired Ends at https://blog.stephenturner.us/p/llm-translate-documentation.—The lang package overrides the ? and help() functions in your R session. The translated help page will appear in the help pane in RStudio or Positron. It can also translate your Roxygen documentation.—Using LLMs in RMost of the developer tooling for AI/LL...
6062 sym R (327 sym/7 pcs) 20 img
Turn a GitHub repo into a single text file for LLM-friendly input (repost)
Reposted from the original at https://blog.stephenturner.us/p/github-repo-to-text-for-llm-input.— If you use ChatGPT, Claude, or even some local model through Ollama or HuggingFace Assistants, you’ll know that the chat interface makes it challenging to feed in an entire repo like a Python or R package, because functions, tests, etc. can be...
4500 sym Python (491 sym/1 pcs) 16 img
Tech I’m thankful for (repost)
Reposted from https://blog.stephenturner.us/p/tech-im-thankful-for-2024Data science and bioinformatics tech I’m thankful for in 2024: tidyverse, RStudio, Positron, Bluesky, blogs, Quarto, bioRxiv, LLMs for code, Ollama, Seqera Containers, StackOverflow, …It’s a short week here in the US. As I reflect on the tools that shape modern bioinforma...
5283 sym
Expand your Bluesky network with R (repost)
This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r.—I’m encouraging everyone I know online to join the scientific community on Bluesky.Bluesky for ScienceStephen Turner·Nov 16Read full storyIn that post I link to several starter packs — lists of accounts posting about a topic that you ca...
2630 sym R (1560 sym/2 pcs) 6 img
Build a Python CLI with Click+Cookiecutter (repost)
Reposted from the original at https://blog.stephenturner.us/p/python-cli-click-cookiecutter. —In the spirit of Learning in Public, I wanted an excuse to explore (1) click for creating command line interfaces, (2) Cookiecutter project templates, and (3) modern tools in the Python packaging ecosystem. If you’re primarily an R developer lik...
10319 sym Python (1799 sym/15 pcs) 6 img
Python for R users (repost)
This is reposted from the original at https://blog.stephenturner.us/p/python-for-r-users.—A Google search for “R vs Python” returns thousands of hits across sites like Reddit, IBM, Datacamp, Coursera, Kaggle, and many others. A quick Google Trends analysis shows that this search query has grown steadily over the last decade.Google Trend anal...
9741 sym 6 img
Use nanoparquet instead of readr/CSV
This is reposted from the original at https://blog.stephenturner.us/p/use-nanoparquet-instead-of-readr-csv.Parquet is interoperable between Python and R, fast to read+write, works well with databases, and stores complex data types (e.g., tibble listcols). Use it instead of CSV. Many pros, few (no?) cons.Yesterday I wrote about base R vs. dplyr vs....
5061 sym R (3697 sym/10 pcs) 4 img
DuckDB vs dplyr vs base R
Reposted from https://blog.stephenturner.us/p/duckdb-vs-dplyr-vs-base-r.TL;DR: For a very simple analysis (means by group on 100M rows), duckdb was 125x faster than base R, and 28x faster than readr+dplyr, without having to read data from disk into memory. The duckplyr package wraps DuckDB’s analytical query processing techniques in a dplyr-comp...
7717 sym R (2449 sym/7 pcs) 4 img
Create a free Llama 3.1 405B-powered chatbot on any GitHub repo in 1 minute (cross-posted from Paired Ends)
This blog has moved. This is reposted from Paired Ends:https://blog.stephenturner.us/p/create-a-free-llama-405b-llm-chatbot-github-repo-huggingfaceLlama 3.1 405B is the first open-source LLM on par with frontier models GPT-4o and Claude 3.5 Sonnet. I’ve been running the 70B model locally for a while now using Ollama + Open WebUI, but you’...
4088 sym 8 img
PLANES: Plausibility Analysis of Epidemiological Signals
This blog has moved. This is reposted from Paired Ends:https://blog.stephenturner.us/p/planes-plausibility-analysis-of-epidemiological-signals-rplanes-r-package PLANES provides a set of methods for evaluating the plausibility of epidemiological signals and forecasts. The PLANES methods are available in the rplanes R package and Shiny app.Motivatio...
12375 sym 20 img