Image for post
Image for post

Learn how to combine and pull data from multiple sources using SQL JOINs

Overview

In this episode of SQL Crash Course, we’re going to learn how to combine multiple tables in SQL, and maximize the efficacy of our SELECT statements using a JOIN clause. We’re going to learn about the 4 different types of JOIN methods and when we might want to use each one. We’re finally going to wrap up by creating a database and some tables, adding some data, and trying to implement some JOINs ourselves.

This is episode 3 of this SQL Crash Course series. If you’re unfamiliar with SQL concepts like pulling data, creating tables, adding and deleting rows, etc. …


Image for post
Image for post

Learn the basics of creating tables, adding rows, and deleting data in SQLite from the console/terminal

Overview

In this installment of SQL Crash Course, we’re going to learn the difference between a SQL database and a SQL table, and how to create them both — we’re going to get a more complete look at SQL and SQLite data types in the process. We’re then going to learn how we can add and delete data from these tables, and how we can manage the tables within a database.

This is episode 2 of our SQL Crash Course series. The last article talked about what SQL is, why it’s useful, and how we can use it to select and filter data from SQL tables. We also touched on SQL data types which is a concept that’s used heavily in the following tutorials. …


Image for post
Image for post

Covering SQL best practices, SELECT, FROM, WHERE, AND/OR, GROUP BY, ORDER BY, and more…

Overview

We’re going to learn what SQL is, what we can use it for, and then we’re going to start writing some basic queries for selecting and filtering data from a database.

Note: We won’t actually be working with a SQL database today, we will be using Python to create a dataframe and then will use a library called pandasql to query it. If you are unfamiliar with Python, I still suggest you following the tutorial and paying attention to the SQL commands and their outputs.

What is SQL?

Data is everywhere, and more of it is being created every minute. We used to store this data on paper in giant filing cabinets but now we store it digitally in things called databases. Now, how do we easily pull the data we want from this digital database? That’s what SQL is for! SQL (Structured Query Language) is a language we use to communicate with our databases. If you want to pull, edit, or add data to a database, you can use SQL to do that. Databases can be created in a variety of architectures (giant data lakes, simple 1-table schemas ), written in a variety of languages (C++, Java), but SQL is the common ground that lets anyone access this data using universal syntax. …


Image for post
Image for post

A comparison between Bayesian and Frequentist inference, and two practical examples using Bayes Theorem

Article Structure:

1. What is Bayes Theorem?

2. Bayesian vs Frequentist Statistical Frameworks

3. In Practice

• Fish tank example

• Spam detection example

4. Recap of Learning

What is Bayes Theorem?

Bayes Theorem is a mathematical formula that represents the probability of an event occurring, based on prior knowledge about related conditions. Essentially, it’s a way to quantify certainty for some event happening, given some knowledge of the conditions affecting the event.

You may or may not be familiar with Bayes Theorem, so here’s a quick refresher…

Image for post
Image for post
  • P(…) is representing the probability of the inclosed event occurring
  • P(A|B) can be read as “the probability of A given…

Image for post
Image for post

Methods for quantifying error and assessing predictive performance in regression modeling

Terms to know

These terms will come up, and it’s good to get familiar with them if you aren’t already:

Regression analysis — a set of statistical processes for estimating a continuous dependent variable given a number of independents

Variance — measurement of the spread between numbers in a data set

— the estimated value of y

ȳ — mean value of y

“Goodness of fit”

Goodness of fit is typically a term used to describe how well a dataset aligns with a certain statistical distribution. …


What is the Central Limit Theorem?

According to Google, the Central Limit Theorem(CLT) states that “under many conditions, independent random variables summed together will converge to a normal distribution as the number of variables increases.”

Essentially, CLT says that a distribution of sample means will approximate a normal distribution as the sample size gets larger, regardless of the original distribution.

Why do we need it?

Abbreviated answer: It allows us to use sample statistics to estimate population parameters, and allows us to treat non-normal data as normal. …


Image for post
Image for post

A tutorial for ImageDataGenerator’s .flow, .flow_from_dataframe and .flow_from_directory methods

Note: this tutorial assumes a basic familiarity with Keras and building neural networks

What is ImageDataGenerator?

ImageDataGenerator is Keras’s go-to class for pipelining image data for deep learning. It allows easy access to your local file-system and multiple different methods for loading in data from different structures. It also has some pretty powerful data pre-processing and augmentation capabilities.

For the purposes of this tutorial, we won’t be doing much data augmentation, we will primarily be focusing on the different methods for reading data in using ImageDataGenerator.

Methods and use-cases

There are three methods at your disposal for loading in data using anImageDataGenerator and they are typically used in the following…


Exploring data with Python can be confusing, here are three simple techniques for selecting data in Pandas that makes it easy

Image for post
Image for post

Pandas is one of the most widely adopted, and easiest to use libraries for working with tabular data in Python. The Python Data Analysis Library (a.k.a. Pandas) gives you data structures and visualizations, allows for seamless exploratory data analysis (EDA), and makes manipulating your data extremely simple.

How to Load data

Pandas is compatible with just about any data source you can think of. For this tutorial, we’re going to be using the Boston Housing Prices data set from the Scikit-Learn corpus. First, let’s load in the libraries we’re going to use:

If you don’t have these packages installed on your machine already, run the pip install package_name command from the command-line before loading them into your notebook. …


Image for post
Image for post

The end of “but it works fine on my computer :/”

Overview

After reading this article, you’ll have a firm understanding of…

✓ What Docker is and what it attempts to accomplish

✓ The different components within the Docker ecosystem and how they fit together

✓ How to get started with Docker (tutorial)

To get the most out of this article you would ideally have…

✓ Some sort of software development or programming experience

✓ Some familiarity with version control (ex. Git)

✓ Experience with the command line

What is Docker?

Docker is a computer program that performs operating-system-level virtualization, also known as “containerization.”

At a high level, Docker is a tool for running applications in isolation and provides similar advantages to running an application inside a virtual machine. We say Docker is similar to a virtual machine because just like a VM, it creates a system by which applications can be run in the same, consistent environment every time, regardless of where the code actually lives, or the dependencies currently installed on the host machine. It allows you to sandbox multiple projects, keeping their requirements separate from each other, and enables you to share these projects as single packages containing both your environment requirements and code. These packages are appropriately called containers. …


Understanding how algorithm efficiency is measured and optimized

Image for post
Image for post

Introduction

Time should always be on a programmer’s mind. Namely, saving users and customers more of it. The less time users spend waiting, the more time they spend doing useful things with your product or service.

Just as with all aspects of life, there are many ways to solve a programming challenge. And — just as with real life — the method you choose could directly affect how long it takes to solve your problem. In this article, we’re going to explore the concept of efficiency within computer science and learn some ways to measure and describe this efficiency. …

Sam Thurman

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store