Sunday, October 8, 2017

Fast and Elegant Scraping Framework for Gophers

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Documentation

Features

Clean API
Fast (>1k request/sec on a single core)
Manages request delays and maximum concurrency per domain
Automatic cookie and session handling
Sync/async/parallel scraping

Example

func main() {
 c := colly.NewCollector()

 // Find and visit all links
 c.OnHTML("a", func(e *colly.HTMLElement) {
  link := e.Attr("href")
  fmt.Println(link)
  c.Visit(e.Request.AbsoluteURL(link))
 })

 c.Visit("https://en.wikipedia.org/")
}

See examples folder for more detailed examples.

Language focused docker images, minus the operating system.

"Distroless" images contain only your application and its runtime dependencies. They do not contain package managers, shells any other programs you would expect to find in a standard Linux distribution.

For more information, see this talk (video).

Why should I use distroless images?

Restricting what's in your runtime container to precisely what's necessary for your app is a best practice employed by Google and other tech giants that have used containers in production for many years. It improves the signal to noise of scanners (e.g. CVE) and reduces the burden of establishing provenance to just what you need.

How do I use distroless images?

These images are built using the bazel tool, but they can also be used through other Docker image build tooling.

Docker

Docker multi-stage builds make using distroless images easy. Follow these steps to get started:

Pick the right base image for your application stack We publish the following distroless base images on gcr.io:
Write a multi-stage docker file. Note: This requires Docker 17.05 or higher.

The basic idea is that you'll have one stage to build your application artifacts, and insert them into your runtime distroless image. If you'd like to learn more, please see the documentation on multi-stage builds.

Here's a quick example.
```
# Start by building the application.
FROM golang:1.8 as build

WORKDIR /go/src/app
COPY . .

RUN go-wrapper download   # "go get -d -v ./..."
RUN go-wrapper install

# Now copy it into our base image.
FROM gcr.io/distroless/base
COPY --from=build /go/bin/app /
CMD ["/app"]
```

Bazel

For full documentation on how to use bazel to generate Docker images, see the bazelbuild/rules_docker repository.

For documentation and examples on how to use the bazel package manager rules, see ./package_manager

Examples can be found in this repository in the examples directory.

Examples

We have some examples on how to run some common application stacks in the /examples directory. See here for:

See here for examples on how to complete some common tasks in your image:

See here for more information on how these images are built and released.

dist-prog-book

Programming Models for Distributed Computation

Source repo for the book that I and my students in my course at Northeastern University, CS7680 Special Topics in Computing Systems: Programming Models for Distributed Computing, are writing on the topic of programming models for distributed systems.

This is a book about the programming constructs we use to build distributed systems. These range from the small, RPC, futures, actors, to the large; systems built up of these components like MapReduce and Spark. We explore issues and concerns central to distributed systems like consistency, availability, and fault tolerance, from the lens of the programming models and frameworks that the programmer uses to build these systems.

Please note that this is a work in progress, the book contents are in this repo, but we have not yet polished everything and published the final book online. Expected release: end of December

Note: the chapters can be viewed by manually going to http://dist-prog-book.com/chapter/x/article-name.html, e.g., http://dist-prog-book.com/chapter/2/futures.html. One we finish off the chapters that need the most work, we will "release" the book by putting a proper index page in place.

Chapters

RPC
Futures & Promises
Message-passing
Distributed Programming Languages
Languages Extended for Distribution
CAP, Consistency, & CRDTs
Programming Languages & Consistency
Large-scale Parallel Batch Processing
Large-scale Streaming

Editing this book

Workflow

Fork/clone
Edit on your local branch
Make a pull request to the master branch with your changes. Do not commit directly to the repository
After merge, visit the live site http://dist-prog-book.com/chapter/x/your-article.html

Note: We have CI that builds the book for each commit. Pull requests that don't build will not be merged.

Note: when PRs are merged, the site is built and redeployed automatically.

Structure

Chapters are located in the chapter folder of the root directory.

Dependencies

This site uses a Jekyll, a Ruby framework. You'll need Ruby and Bundler installed.

If you have Ruby already installed, to install Bundler, just do sudo gem install bundler

Building & Viewing

Please build and view your site locally before submitting a PR!

cd into the directory where you cloned this repository, then install the required gems with bundle install. This will automatically put the gems into ./vendor/bundle.

Start the server in the context of the bundle:

bundle exec jekyll serve

The generated site is available at http://localhost:4000

Note, this will bring you to the index page. If you'd like to see your chapter, make sure to navigate there explicitly, e.g.,http://localhost:4000/chapter/1/rpc.html.

Adding/editing pages

Articles are in Markdown with straightforward YAML frontmatter.

You can include code, math (LaTeX syntax), figures, blockquotes, side notes, etc. You can also use regular BibTeX to make a bibliography. To see everything you can do, I've prepared an example article.

If you would like to add BibTeX entries to the bibliography for your chapter, check the _bibliography directory for a .bibfile named after your chapter.

Explore how machine learning works, live in the browser. No coding required.

Teachable Machine

About

Teachable Machine is an experiment that makes it easier for anyone to explore machine learning, live in the browser – no coding required. Learn more about the experiment and try it yourself on g.co/teachablemachine.

The experiment is built using the deeplearn.js library.

Development

Install dependencies by running (similar to `npm install`)

yarn

Build project

yarn build

Start local server by running

yarn run watch

Code Styles

There’s a pre-commit hook set up that will prevent commits when there are errors
Run yarn eslint for es6 errors & warnings
Run yarn stylint for stylus errors & warnings

To run https locally:

https is required to get camera permissions to work when not working with localhost

Generate Keys

openssl genrsa -out server.key 2048
openssl req -new -x509 -sha256 -key server.key -out server.cer -days 365 -subj /CN=YOUR_IP

Use yarn run watch-https
Go to https://YOUR_IP:3000, then accept the insecure privacy notice, and proceed.

Credit

This is not an official Google product, but an experiment that was a collaborative effort by friends from Støj, Use All Five and Creative Lab and PAIR teams at Google.

Drag-n-Drop Email Editor Component for React.js

React Email Editor

The excellent drag-n-drop email editor by Unroll.io as a React.js component. This is the most powerful and developer friendly visual email builder for your app.

Video Overview

Watch video overview: https://youtu.be/IoY7-NZ8TcA

Live Demo

Check out the live demo here: http://react-email-editor-demo.netlify.com/ (Source Code)

Blog Post

Here's a blog post with a quickstart guide: https://medium.com/unroll-blog/creating-a-drag-n-drop-email-editor-with-react-db1e9eb42386

Installation

The easiest way to use React Email Editor is to install it from NPM and include it in your own React build process.

npm install react-email-editor --save

Usage

Require the EmailEditor component and render it with JSX:

import React, { Component } from 'react'
import { render } from 'react-dom'

import EmailEditor from 'react-email-editor'

class App extends Component {
  render() {
    return <div>
      <h1>react-email-editor Demo</h1>

      <div>
        <button onClick={this.exportHtml}>Export HTML</button>
      </div>

      <EmailEditor
        ref={designer => this.designer = designer}
      />
    </div>
  }

  exportHtml = () => {
    this.designer.exportHtml(data => {
      const { design, html } = data
      console.log('exportHtml', html)
    })
  }
}

render(<App />, document.getElementById('app'))

Properties

style Object style object for the editor container (default {})
minHeight String minimum height to initialize the editor with (default 500px)
onLoad Function called when the editor has finished loading
options Object options passed to the Unroll editor instance (default {})

See the Unroll Docs for all available options.

Methods

loadDesign - function(Object data) - Takes the design JSON and loads it in the editor
saveDesign - function(Function callback) - Returns the design JSON in a callback function
exportHtml - function(Function callback) - Returns the design HTML and JSON in a callback function

See the example source for a reference implementation.

The Darwin Kernel (mirror)

What is XNU?

XNU kernel is part of the Darwin operating system for use in OS X and iOS operating systems. XNU is an acronym for XNU is Not Unix. XNU is a hybrid kernel combining the Mach kernel developed at Carnegie Mellon University with components from FreeBSD and C++ API for writing drivers called IOKit. XNU runs on I386, X86_64 for both single processor and multi-processor configurations.

XNU Source Tree

config - configurations for exported apis for supported architecture and platform
SETUP - Basic set of tools used for configuring the kernel, versioning and kextsymbol management.
EXTERNAL_HEADERS - Headers sourced from other projects to avoid dependency cycles when building. These headers should be regularly synced when source is updated.
libkern - C++ IOKit library code for handling of drivers and kexts.
libsa - kernel bootstrap code for startup
libsyscall - syscall library interface for userspace programs
libkdd - source for user library for parsing kernel data like kernel chunked data.
makedefs - top level rules and defines for kernel build.
osfmk - Mach kernel based subsystems
pexpert - Platform specific code like interrupt handling, atomics etc.
security - Mandatory Access Check policy interfaces and related implementation.
bsd - BSD subsystems code
tools - A set of utilities for testing, debugging and profiling kernel.

How to build XNU

Building `DEVELOPMENT` kernel

The xnu make system can build kernel based on KERNEL_CONFIGS & ARCH_CONFIGS variables as arguments. Here is the syntax:

make SDKROOT=<sdkroot> ARCH_CONFIGS=<arch> KERNEL_CONFIGS=<variant>

Where:

<sdkroot>: path to MacOS SDK on disk. (defaults to /)
<variant>: can be debug, development, release, profile and configures compilation flags and asserts throughout kernel code.
<arch> : can be valid arch to build for. (E.g. i386 or X86_64)

To build a kernel for the same architecture as running OS, just type

$ make
$ make SDKROOT=macosx.internal

Additionally, there is support for configuring architectures through ARCH_CONFIGS and kernel configurations with KERNEL_CONFIGS.

$ make SDKROOT=macosx.internal ARCH_CONFIGS=X86_64 KERNEL_CONFIGS=DEVELOPMENT
$ make SDKROOT=macosx.internal ARCH_CONFIGS=X86_64 KERNEL_CONFIGS="RELEASE DEVELOPMENT DEBUG"

Note:

By default, architecture is set to the build machine architecture, and the default kernel config is set to build for DEVELOPMENT.

This will also create a bootable image, kernel.[config], and a kernel binary with symbols, kernel.[config].unstripped.

To build with RELEASE kernel configuration

make KERNEL_CONFIGS=RELEASE SDKROOT=/path/to/SDK

Building FAT kernel binary

Define architectures in your environment or when running a make command.

$ make ARCH_CONFIGS="I386 X86_64" exporthdrs all

Other makefile options

$ make MAKEJOBS=-j8 # this will use 8 processes during the build. The default is 2x the number of active CPUS.
$ make -j8 # the standard command-line option is also accepted
$ make -w # trace recursive make invocations. Useful in combination with VERBOSE=YES
$ make BUILD_LTO=0 # build without LLVM Link Time Optimization
$ make REMOTEBUILD=user@remotehost # perform build on remote host
$ make BUILD_JSON_COMPILATION_DATABASE=1 # Build Clang JSON Compilation Database

The XNU build system can optionally output color-formatted build output. To enable this, you can either set the XNU_LOGCOLORS environment variable to y, or you can pass LOGCOLORS=y to the make command.

Debug information formats

By default, a DWARF debug information repository is created during the install phase; this is a "bundle" named kernel.development.<variant>.dSYM To select the older STABS debug information format (where debug information is embedded in the kernel.development.unstripped image), set the BUILD_STABS environment variable.

$ export BUILD_STABS=1
$ make

Building KernelCaches

To test the xnu kernel, you need to build a kernelcache that links the kexts and kernel together into a single bootable image. To build a kernelcache you can use the following mechanisms:

Using automatic kernelcache generation with kextd. The kextd daemon keeps watching for changing in /System/Library/Extensions directory. So you can setup new kernel as
```
$ cp BUILD/obj/DEVELOPMENT/X86_64/kernel.development /System/Library/Kernels/
$ touch /System/Library/Extensions
$ ps -e | grep kextd
```

Manually invoking kextcache to build new kernelcache.

$ kextcache -q -z -a x86_64 -l -n -c /var/tmp/kernelcache.test -K /var/tmp/kernel.test /System/Library/Extensions

Running KernelCache on Target machine

The development kernel and iBoot supports configuring boot arguments so that we can safely boot into test kernel and, if things go wrong, safely fall back to previously used kernelcache. Following are the steps to get such a setup:

Create kernel cache using the kextcache command as /kernelcache.test

Copy exiting boot configurations to alternate file

$ cp /Library/Preferences/SystemConfiguration/com.apple.Boot.plist /next_boot.plist

Update the kernelcache and boot-args for your setup

$ plutil -insert "Kernel Cache" -string "kernelcache.test" /next_boot.plist
$ plutil -replace "Kernel Flags" -string "debug=0x144 -v kernelsuffix=test " /next_boot.plist

Copy the new config to /Library/Preferences/SystemConfiguration/

$ cp /next_boot.plist /Library/Preferences/SystemConfiguration/boot.plist

Bless the volume with new configs.
```
$ sudo -n bless  --mount / --setBoot --nextonly --options "config=boot"
```
The --nextonly flag specifies that use the boot.plist configs only for one boot. So if the kernel panic's you can easily power reboot and recover back to original kernel.

Creating tags and cscope

Set up your build environment and from the top directory, run:

$ make tags     # this will build ctags and etags on a case-sensitive volume, only ctags on case-insensitive
$ make TAGS     # this will build etags
$ make cscope   # this will build cscope database

Coding styles (Reindenting files)

Source files can be reindented using clang-format setup in .clang-format. XNU follows a variant of WebKit style for source code formatting. Please refer to format styles at WebKit website. Further options about style options is available at clang docs

Note: clang-format binary may not be part of base installation. It can be compiled from llvm clang sources and is reachable in $PATH.

From the top directory, run:

$ make reindent # reindent all source files using clang format.

How to install a new header file from XNU

To install IOKit headers, see additional comments in iokit/IOKit/Makefile.

XNU installs header files at the following locations -

a. $(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers
b. $(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders
c. $(DSTROOT)/usr/include/
d. $(DSTROOT)/System/Library/Frameworks/System.framework/PrivateHeaders

Kernel.framework is used by kernel extensions.
The System.framework and /usr/include are used by user level applications.
The header files in framework's PrivateHeaders are only available for ** Apple Internal Development **.

The directory containing the header file should have a Makefile that creates the list of files that should be installed at different locations. If you are adding first header file in a directory, you will need to create Makefile similar to xnu/bsd/sys/Makefile.

Add your header file to the correct file list depending on where you want to install it. The default locations where the header files are installed from each file list are -

a. `DATAFILES` : To make header file available in user level -
   `$(DSTROOT)/usr/include`

b. `PRIVATE_DATAFILES` : To make header file available to Apple internal in
   user level -
   `$(DSTROOT)/System/Library/Frameworks/System.framework/PrivateHeaders`

c. `KERNELFILES` : To make header file available in kernel level -
   `$(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers`
   `$(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders`

d. `PRIVATE_KERNELFILES` : To make header file available to Apple internal
   for kernel extensions -
   `$(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders`

The Makefile combines the file lists mentioned above into different install lists which are used by build system to install the header files.

If the install list that you are interested does not exist, create it by adding the appropriate file lists. The default install lists, its member file lists and their default location are described below -

a. `INSTALL_MI_LIST` : Installs header file to a location that is available to everyone in user level.
    Locations -
       $(DSTROOT)/usr/include
   Definition -
       INSTALL_MI_LIST = ${DATAFILES}

b.  `INSTALL_MI_LCL_LIST` : Installs header file to a location that is available
   for Apple internal in user level.
   Locations -
       $(DSTROOT)/System/Library/Frameworks/System.framework/PrivateHeaders
   Definition -
       INSTALL_MI_LCL_LIST = ${PRIVATE_DATAFILES}

c. `INSTALL_KF_MI_LIST` : Installs header file to location that is available
   to everyone for kernel extensions.
   Locations -
        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers
   Definition -
        INSTALL_KF_MI_LIST = ${KERNELFILES}

d. `INSTALL_KF_MI_LCL_LIST` : Installs header file to location that is
   available for Apple internal for kernel extensions.
   Locations -
        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders
   Definition -
        INSTALL_KF_MI_LCL_LIST = ${KERNELFILES} ${PRIVATE_KERNELFILES}

e. `EXPORT_MI_LIST` : Exports header file to all of xnu (bsd/, osfmk/, etc.)
   for compilation only. Does not install anything into the SDK.
   Definition -
        EXPORT_MI_LIST = ${KERNELFILES} ${PRIVATE_KERNELFILES}

If you want to install the header file in a sub-directory of the paths described in (1), specify the directory name using two variables INSTALL_MI_DIR and EXPORT_MI_DIR as follows -

INSTALL_MI_DIR = dirname
EXPORT_MI_DIR = dirname

A single header file can exist at different locations using the steps mentioned above. However it might not be desirable to make all the code in the header file available at all the locations. For example, you want to export a function only to kernel level but not user level.

You can use C language's pre-processor directive (#ifdef, #endif, #ifndef) to control the text generated before a header file is installed. The kernel only includes the code if the conditional macro is TRUE and strips out code for FALSE conditions from the header file.

Some pre-defined macros and their descriptions are -

a. `PRIVATE` : If true, code is available to all of the xnu kernel and is
   not available in kernel extensions and user level header files.  The
   header files installed in all the paths described above in (1) will not
   have code enclosed within this macro.

b. `KERNEL_PRIVATE` : If true, code is available to all of the xnu kernel and Apple
    internal kernel extensions.

c. `BSD_KERNEL_PRIVATE` : If true, code is available to the xnu/bsd part of
   the kernel and is not available to rest of the kernel, kernel extensions
   and user level header files.  The header files installed in all the
   paths described above in (1) will not have code enclosed within this macro.

d. `KERNEL` :  If true, code is available only in kernel and kernel
   extensions and is not available in user level header files.  Only the
   header files installed in following paths will have the code -

        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers
        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders

   you should check [Testing the kernel][] for details.

How to add a new syscall

Testing the kernel

XNU kernel has multiple mechanisms for testing.

Assertions - The DEVELOPMENT and DEBUG kernel configs are compiled with assertions enabled. This allows developers to easily test invariants and conditions.
XNU Power On Self Tests (XNUPOST): The XNUPOST config allows for building the kernel with basic set of test functions that are run before first user space process is launched. Since XNU is hybrid between MACH and BSD, we have two locations where tests can be added.
```
xnu/osfmk/tests/     # For testing mach based kernel structures and apis.
bsd/tests/           # For testing BSD interfaces.
```
Please follow the documentation at osfmk/tests/README.md
User level tests: The tools/tests/ directory holds all the tests that verify syscalls and other features of the xnu kernel. The make target xnu_tests can be used to build all the tests supported.
```
$ make RC_ProjectName=xnu_tests SDKROOT=/path/to/SDK
```
These tests are individual programs that can be run from Terminal and report tests status by means of std posix exit codes (0 -> success) and/or stdout. Please read detailed documentation in tools/tests/unit_tests/README.md

Kernel data descriptors

XNU uses different data formats for passing data in its api. The most standard way is using syscall arguments. But for complex data it often relies of sending memory saved by C structs. This packaged data transport mechanism is fragile and leads to broken interfaces between user space programs and kernel apis. libkdd directory holds user space library that can parse custom data provided by the same version of kernel. The kernel chunked data format is described in detail at libkdd/README.md.

Debugging the kernel

The xnu kernel supports debugging with a remote kernel debugging protocol (kdp). Please refer documentation at [technical note] TN2063 By default the kernel is setup to reboot on a panic. To debug a live kernel, the kdp server is setup to listen for UDP connections over ethernet. For machines without ethernet port, this behavior can be altered with use of kernel boot-args. Following are some common options.

debug=0x144 - setups debug variables to start kdp debugserver on panic
-v - print kernel logs on screen. By default XNU only shows grey screen with boot art.
kdp_match_name=en1 - Override default port selection for kdp. Supported for ethernet, thunderbolt and serial debugging.

To debug a panic'ed kernel, use llvm debugger (lldb) along with unstripped symbol rich kernel binary.

sh$ lldb kernel.development.unstripped

And then you can connect to panic'ed machine with kdp_remote [ip addr] or gdb_remote [hostip : port] commands.

Each kernel is packaged with kernel specific debug scripts as part of the build process. For security reasons these special commands and scripts do not get loaded automatically when lldb is connected to machine. Please add the following setting to your ~/.lldbinit if you wish to always load these macros.

settings set target.load-script-from-symbol-file true

The tools/lldbmacros directory contains the source for each of these commands. Please follow the README.md for detailed explanation of commands and their usage.