References
Relates to:
- [[Software Engineering]] (contains learning resource specifically for software engineering topics)
- [[00 - Data Engineering]] (contains learning resources specifically for data engineering topics) –> online wiki: Data Engineering Wiki
Interesting sites:
- https://maximiliankiener.com/12/
- Brilliant Courses –> for beginning lifelong learner
- neal.fun –> funny little game
- tools for better thinking
- less wrong
- Interesting website for fun
- https://tinyclouds.org/
- the decision lab
- commoncog’s best post
Other’s blog/note:
Someone’s personal blogs that I can read and learn and borrow ideas from:
- Paul Graham: Essays
- Aaron Swartz’s:
- By category: The Archives (Aaron Swartz’s Raw Thought)
- All: fullarchive
- Sym-poly-masthesy –> the one excited with Rust 7 years ago, sharing about his career thoughts and miscs stuff
- Antirez (redis founder): https://antirez.com/latest/0
- https://brooker.co.za/blog/ (an aws developer I stumpled upon when finding distributed system papers to read –> he wrote about Guide of reading papers for SWE which I think kinda helpful –> [[2025-W07#Lifelong learning]])
- https://eatonphil.com/ –> focus on database, ditributed database system, good read. he also has a page listing his favorite developer blogs (https://eatonphil.com/blogs.html)
- https://luminousmen.com/
Other blogs: 6. https://jaehyeon.me/blog/ (DE blog) 7. https://braindump.jethro.dev/ (braindump, notes collection) 8. https://www.gaurgaurav.com/ (code thoughts - blogs share coding practices, programming tutorials) 9. https://dataengineering.wiki/Index –> data engineering wiki, sponsored by Data engineering jobs and great expectations 10. others’ threads (sharing cool readings): - @nqhieu2001
Research Papers
- https://jeffhuang.com/best_paper_awards/ –> best paper awards in Computer Science
- https://github.com/papers-we-love/papers-we-love –> github, but last updated is 2023
Github resources
Books
The one about discipline (kinda self-help): Put your ass where your heart wants to be
About books: https://huyenchip.com/2022/12/27/books-for-every-engineer.html
Thanks for sharing! - For 2023, consider perhaps: • Applied Minds: How Engineers Think, by Guru Madhavan • Out of our Minds: The Power of Being Creative, by Sir Ken Robinson • Peak: Secrets from the New Science of Expertise, by Anders Ericsson (a great counter-read for Range by David Epstein). Ericsson is the author from who’s work Gladwell (mis)appropriated the 10K Hr. rule. • Growing Wings on the Way: Systems Thinking for Messy Situations, by Rosalind Armson (This is the book I would have written on systems thinking if I could write a book!)
Other Books
Visualizing Google Cloud Architecting Google Cloud Solutions Building your next big thing with Google Cloud Platform Google Cloud Cookbook Data Science on the Google Cloud Platform Learning Google BigQuery
The Economic Benefits of Google Cloud Data Fusion https://services.google.com/fh/files/misc/esg-economic-validation-google-cloud-data-fusion.pdf
System Design Interview Streaming Systems: Large-scale data processing Grokking Streaming Systems: Real-time event processing
Data Related Stuff
gitlab handbook: Data Team Learning Library
Free data source and inspiration
Free data sources:
- statista.com –> empowering people with data (insights and facts across 170 industries and 150+ countries)
- ourworldindata.org –> research and data to make progress against the world’s largest problems
- list of open data sources: data sources for journalism and research; government; science and technology; international organizations;
- fivethirtyeight
- google: dataset search –> search dataset for research
- Free datasets for analytics projects: datastoryteller.gumroad.com
- github: public APIs for free: https://github.com/public-apis/public-apis
- https://power.larc.nasa.gov/
==From reddit thread:== List of open source data sources:
- https://fred.stlouisfed.org/ - US economic data
- https://www.data.gov/ - Boatloads of US government data
- https://github.com/OpportunityInsights/EconomicTracker - One of my current favorites, this is some data being used to track the US economic recovery post COVID. This has a ton of interesting things - Covid related data (including things like lockdown dates, changes in local policy, unemployment changes, etc. at the state and local levels), employment, consumer spending, education related statistics, and Google/Apple mobility reports.
- https://github.com/BuzzFeedNews - Similar to the 538 data, this is all the open source data BuzzfeedNews has released. Lots of US politics here.
- https://github.com/awesomedata/awesome-public-datasets- lots and lots of random datasets broken out by category.
- https://snap.stanford.edu/data/ - Lots of social media related datasets
- https://research.google.com/youtube8m/ - 8 million categorized youtube videos
- https://research.atspotify.com/datasets/ - lots of music/podcast related data. The million playlist dataset is a pretty cool one.
- https://datasetsearch.research.google.com/ - Great tool for searching for specific datasets
Reports/dashboard inspirations:
- Power BI DataViz World Championships
- pudding.cool
Data stacks newsletters
Resource collection: - github: data engineer handbook, all resources to learn about DE
https://devv.ai/ –> AI search tools specialized for Developer
Learning path:
- https://github.com/andkret/Cookbook –> details, from basic to advanced concept of data engineering, cover (almost) all contents, and free
- data engineering mastery course –> costly, but can read the structure and topics outlined in this course for self-studying
Specialized Newsletters:
- https://www.moderndatastack.xyz/categories
- https://www.blef.fr/tag/datanews/
- https://medium.com/data-monzo -> monzo company (UK financial services) presents its data stacks and biz use cases
- https://blog.datahubproject.io/ -> DataHub is an open source tool for data catalog (and more) (also: https://datahubproject.io/)
- substack
- https://benn.substack.com/
- https://dataanalysis.substack.com/
- https://learnanalyticsengineering.substack.com
- https://roundup.getdbt.com/ = The analytics engineering roundup
Data Companies’ resources:
- Read: Data Insights part in https://airbyte.com/blog-categories/data-insights
- dbt: the analytics engineering roundup: https://roundup.getdbt.com/
- https://netflixtechblog.com/
- spotify data engineering blog
- uber engineering blog (data, AI/ML)
Helpful posts on programming, cloud infrastructure, AI, software engineering topics. Piece of daily common encounters: 11. how.wtf
Data Product mindset
- Canvas for brainstorming and doing the right thing: Data Product Canvas
- tl;dr: model data stack through the Gervais Principles
- Effectively working with adhoc analysis request??
For data leader:
- constraints-driven data team design: (substack), quick note: Constraints-driven data team design
Data role should focus on impact:
- the most crucial mind shift in a data role? Focus on impact (substack) → explain how to shift mindset, focus on delivering impact, applied in different data roles, clear explain how to make impact for each role and what should not do (non-impact work)
Data Pipeline Design Patterns
- Best practices for data ingestion (also talk about different data layers of your data pipeline output): 7 best practices for data ingestions
Unit test for SQL script:
Other:
- What are metrics of data platform???
Common patterns for data ingestion/data engineering??
Data Governance
Relates to:
- [[Data Governance]]
Data Lineage
- Visualize data lineage using only SQL
- Choose the right grain for your lineage model: the many layers of data lineage
How to understand and use your data? (scalable solution)
Data Observability / other: Data Quality
relates to: [[Data Observability]] practices of data quality:
- tagging data model according to their importance level. example: tier 1, critical, gold standard, etc.
read more about how to measure data quality (having metrics to track your key data metrics = data quality + productivity + engagement
Keywords:
- active metadata
Tools
(python module: alembic
) Schema management (medium)
faster and advanced your python code tips:
- pythonspeed: https://pythonspeed.com/ → good and detailed articles
Style Guide for Data team
Can refer to:
- dbt style guide
- gitlab documentation strategy
- gitlab explains why handbook-first approach is important
Data contracts
Use conventional commits for data development (git):
Document Management / Document Writing
Relates to:
- [[Write documentation effectively]]
Good guide (how to organize knowledge base for data and analytics team): medium post