Sunday, October 8, 2017

Fast and Elegant Scraping Framework for Gophers

Colly

Lightning Fast and Elegant Scraping Framework for Gophers
Colly provides a clean interface to write any kind of crawler/scraper/spider.
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping

Example

func main() {
 c := colly.NewCollector()

 // Find and visit all links
 c.OnHTML("a", func(e *colly.HTMLElement) {
  link := e.Attr("href")
  fmt.Println(link)
  c.Visit(e.Request.AbsoluteURL(link))
 })

 c.Visit("https://en.wikipedia.org/")
}
See examples folder for more detailed examples.

Language focused docker images, minus the operating system.

"Distroless" images contain only your application and its runtime dependencies. They do not contain package managers, shells any other programs you would expect to find in a standard Linux distribution.
For more information, see this talk (video).

Why should I use distroless images?

Restricting what's in your runtime container to precisely what's necessary for your app is a best practice employed by Google and other tech giants that have used containers in production for many years. It improves the signal to noise of scanners (e.g. CVE) and reduces the burden of establishing provenance to just what you need.

How do I use distroless images?

These images are built using the bazel tool, but they can also be used through other Docker image build tooling.

Docker

Docker multi-stage builds make using distroless images easy. Follow these steps to get started:
  • Pick the right base image for your application stack We publish the following distroless base images on gcr.io:
  • Write a multi-stage docker file. Note: This requires Docker 17.05 or higher.
    The basic idea is that you'll have one stage to build your application artifacts, and insert them into your runtime distroless image. If you'd like to learn more, please see the documentation on multi-stage builds.
    Here's a quick example.
    # Start by building the application.
    FROM golang:1.8 as build
    
    WORKDIR /go/src/app
    COPY . .
    
    RUN go-wrapper download   # "go get -d -v ./..."
    RUN go-wrapper install
    
    # Now copy it into our base image.
    FROM gcr.io/distroless/base
    COPY --from=build /go/bin/app /
    CMD ["/app"]
    

Bazel

For full documentation on how to use bazel to generate Docker images, see the bazelbuild/rules_docker repository.
For documentation and examples on how to use the bazel package manager rules, see ./package_manager
Examples can be found in this repository in the examples directory.

Examples

We have some examples on how to run some common application stacks in the /examples directory. See here for:
See here for examples on how to complete some common tasks in your image:
See here for more information on how these images are built and released.

dist-prog-book

Programming Models for Distributed Computation

Source repo for the book that I and my students in my course at Northeastern University, CS7680 Special Topics in Computing Systems: Programming Models for Distributed Computing, are writing on the topic of programming models for distributed systems.
This is a book about the programming constructs we use to build distributed systems. These range from the small, RPC, futures, actors, to the large; systems built up of these components like MapReduce and Spark. We explore issues and concerns central to distributed systems like consistency, availability, and fault tolerance, from the lens of the programming models and frameworks that the programmer uses to build these systems.
Please note that this is a work in progress, the book contents are in this repo, but we have not yet polished everything and published the final book online. Expected release: end of December
Note: the chapters can be viewed by manually going to http://dist-prog-book.com/chapter/x/article-name.html, e.g., http://dist-prog-book.com/chapter/2/futures.html. One we finish off the chapters that need the most work, we will "release" the book by putting a proper index page in place.

Chapters

  1. RPC
  2. Futures & Promises
  3. Message-passing
  4. Distributed Programming Languages
  5. Languages Extended for Distribution
  6. CAP, Consistency, & CRDTs
  7. Programming Languages & Consistency
  8. Large-scale Parallel Batch Processing
  9. Large-scale Streaming

Editing this book

Workflow

  1. Fork/clone
  2. Edit on your local branch
  3. Make a pull request to the master branch with your changes. Do not commit directly to the repository
  4. After merge, visit the live site http://dist-prog-book.com/chapter/x/your-article.html
Note: We have CI that builds the book for each commit. Pull requests that don't build will not be merged.
Note: when PRs are merged, the site is built and redeployed automatically.

Structure

Chapters are located in the chapter folder of the root directory.

Dependencies

This site uses a Jekyll, a Ruby framework. You'll need Ruby and Bundler installed.
If you have Ruby already installed, to install Bundler, just do sudo gem install bundler

Building & Viewing

Please build and view your site locally before submitting a PR!
cd into the directory where you cloned this repository, then install the required gems with bundle install. This will automatically put the gems into ./vendor/bundle.
Start the server in the context of the bundle:
bundle exec jekyll serve
The generated site is available at http://localhost:4000
Note, this will bring you to the index page. If you'd like to see your chapter, make sure to navigate there explicitly, e.g.,http://localhost:4000/chapter/1/rpc.html.

Adding/editing pages

Articles are in Markdown with straightforward YAML frontmatter.
You can include code, math (LaTeX syntax), figures, blockquotes, side notes, etc. You can also use regular BibTeX to make a bibliography. To see everything you can do, I've prepared an example article.
If you would like to add BibTeX entries to the bibliography for your chapter, check the _bibliography directory for a .bibfile named after your chapter.

Explore how machine learning works, live in the browser. No coding required.

Teachable Machine

About

Teachable Machine is an experiment that makes it easier for anyone to explore machine learning, live in the browser – no coding required. Learn more about the experiment and try it yourself on g.co/teachablemachine.
The experiment is built using the deeplearn.js library.

Development

Install dependencies by running (similar to npm install)

yarn

Build project

yarn build

Start local server by running

yarn run watch

Code Styles

  • There’s a pre-commit hook set up that will prevent commits when there are errors
  • Run yarn eslint for es6 errors & warnings
  • Run yarn stylint for stylus errors & warnings

To run https locally:

https is required to get camera permissions to work when not working with localhost
  1. Generate Keys
openssl genrsa -out server.key 2048
openssl req -new -x509 -sha256 -key server.key -out server.cer -days 365 -subj /CN=YOUR_IP
  1. Use yarn run watch-https
  2. Go to https://YOUR_IP:3000, then accept the insecure privacy notice, and proceed.

Credit

This is not an official Google product, but an experiment that was a collaborative effort by friends from StøjUse All Five and Creative Lab and PAIR teams at Google.

Drag-n-Drop Email Editor Component for React.js

React Email Editor

The excellent drag-n-drop email editor by Unroll.io as a React.js component. This is the most powerful and developer friendly visual email builder for your app.
Video Overview
React Email Editor
Watch video overview: https://youtu.be/IoY7-NZ8TcA

Live Demo

Check out the live demo here: http://react-email-editor-demo.netlify.com/ (Source Code)

Blog Post

Installation

The easiest way to use React Email Editor is to install it from NPM and include it in your own React build process.
npm install react-email-editor --save

Usage

Require the EmailEditor component and render it with JSX:
import React, { Component } from 'react'
import { render } from 'react-dom'

import EmailEditor from 'react-email-editor'

class App extends Component {
  render() {
    return <div>
      <h1>react-email-editor Demo</h1>

      <div>
        <button onClick={this.exportHtml}>Export HTML</button>
      </div>

      <EmailEditor
        ref={designer => this.designer = designer}
      />
    </div>
  }

  exportHtml = () => {
    this.designer.exportHtml(data => {
      const { design, html } = data
      console.log('exportHtml', html)
    })
  }
}

render(<App />, document.getElementById('app'))

Properties

  • style Object style object for the editor container (default {})
  • minHeight String minimum height to initialize the editor with (default 500px)
  • onLoad Function called when the editor has finished loading
  • options Object options passed to the Unroll editor instance (default {})
See the Unroll Docs for all available options.

Methods

  • loadDesign - function(Object data) - Takes the design JSON and loads it in the editor
  • saveDesign - function(Function callback) - Returns the design JSON in a callback function
  • exportHtml - function(Function callback) - Returns the design HTML and JSON in a callback function
See the example source for a reference implementation.

The Darwin Kernel (mirror)

What is XNU?

XNU kernel is part of the Darwin operating system for use in OS X and iOS operating systems. XNU is an acronym for XNU is Not Unix. XNU is a hybrid kernel combining the Mach kernel developed at Carnegie Mellon University with components from FreeBSD and C++ API for writing drivers called IOKit. XNU runs on I386, X86_64 for both single processor and multi-processor configurations.

XNU Source Tree

  • config - configurations for exported apis for supported architecture and platform
  • SETUP - Basic set of tools used for configuring the kernel, versioning and kextsymbol management.
  • EXTERNAL_HEADERS - Headers sourced from other projects to avoid dependency cycles when building. These headers should be regularly synced when source is updated.
  • libkern - C++ IOKit library code for handling of drivers and kexts.
  • libsa - kernel bootstrap code for startup
  • libsyscall - syscall library interface for userspace programs
  • libkdd - source for user library for parsing kernel data like kernel chunked data.
  • makedefs - top level rules and defines for kernel build.
  • osfmk - Mach kernel based subsystems
  • pexpert - Platform specific code like interrupt handling, atomics etc.
  • security - Mandatory Access Check policy interfaces and related implementation.
  • bsd - BSD subsystems code
  • tools - A set of utilities for testing, debugging and profiling kernel.

How to build XNU

Building DEVELOPMENT kernel

The xnu make system can build kernel based on KERNEL_CONFIGS & ARCH_CONFIGS variables as arguments. Here is the syntax:
make SDKROOT=<sdkroot> ARCH_CONFIGS=<arch> KERNEL_CONFIGS=<variant>
Where:
  • <sdkroot>: path to MacOS SDK on disk. (defaults to /)
  • <variant>: can be debugdevelopmentreleaseprofile and configures compilation flags and asserts throughout kernel code.
  • <arch> : can be valid arch to build for. (E.g. i386 or X86_64)
To build a kernel for the same architecture as running OS, just type
$ make
$ make SDKROOT=macosx.internal
Additionally, there is support for configuring architectures through ARCH_CONFIGS and kernel configurations with KERNEL_CONFIGS.
$ make SDKROOT=macosx.internal ARCH_CONFIGS=X86_64 KERNEL_CONFIGS=DEVELOPMENT
$ make SDKROOT=macosx.internal ARCH_CONFIGS=X86_64 KERNEL_CONFIGS="RELEASE DEVELOPMENT DEBUG"
Note:
  • By default, architecture is set to the build machine architecture, and the default kernel config is set to build for DEVELOPMENT.
This will also create a bootable image, kernel.[config], and a kernel binary with symbols, kernel.[config].unstripped.
  • To build with RELEASE kernel configuration
    make KERNEL_CONFIGS=RELEASE SDKROOT=/path/to/SDK
    

Building FAT kernel binary

Define architectures in your environment or when running a make command.
$ make ARCH_CONFIGS="I386 X86_64" exporthdrs all

Other makefile options

  • $ make MAKEJOBS=-j8 # this will use 8 processes during the build. The default is 2x the number of active CPUS.
  • $ make -j8 # the standard command-line option is also accepted
  • $ make -w # trace recursive make invocations. Useful in combination with VERBOSE=YES
  • $ make BUILD_LTO=0 # build without LLVM Link Time Optimization
  • $ make REMOTEBUILD=user@remotehost # perform build on remote host
  • $ make BUILD_JSON_COMPILATION_DATABASE=1 # Build Clang JSON Compilation Database
The XNU build system can optionally output color-formatted build output. To enable this, you can either set the XNU_LOGCOLORS environment variable to y, or you can pass LOGCOLORS=y to the make command.

Debug information formats

By default, a DWARF debug information repository is created during the install phase; this is a "bundle" named kernel.development.<variant>.dSYM To select the older STABS debug information format (where debug information is embedded in the kernel.development.unstripped image), set the BUILD_STABS environment variable.
$ export BUILD_STABS=1
$ make

Building KernelCaches

To test the xnu kernel, you need to build a kernelcache that links the kexts and kernel together into a single bootable image. To build a kernelcache you can use the following mechanisms:
  • Using automatic kernelcache generation with kextd. The kextd daemon keeps watching for changing in /System/Library/Extensions directory. So you can setup new kernel as
    $ cp BUILD/obj/DEVELOPMENT/X86_64/kernel.development /System/Library/Kernels/
    $ touch /System/Library/Extensions
    $ ps -e | grep kextd
    
  • Manually invoking kextcache to build new kernelcache.
    $ kextcache -q -z -a x86_64 -l -n -c /var/tmp/kernelcache.test -K /var/tmp/kernel.test /System/Library/Extensions
    

Running KernelCache on Target machine

The development kernel and iBoot supports configuring boot arguments so that we can safely boot into test kernel and, if things go wrong, safely fall back to previously used kernelcache. Following are the steps to get such a setup:
  1. Create kernel cache using the kextcache command as /kernelcache.test
  2. Copy exiting boot configurations to alternate file
    $ cp /Library/Preferences/SystemConfiguration/com.apple.Boot.plist /next_boot.plist
    
  3. Update the kernelcache and boot-args for your setup
    $ plutil -insert "Kernel Cache" -string "kernelcache.test" /next_boot.plist
    $ plutil -replace "Kernel Flags" -string "debug=0x144 -v kernelsuffix=test " /next_boot.plist
    
  4. Copy the new config to /Library/Preferences/SystemConfiguration/
    $ cp /next_boot.plist /Library/Preferences/SystemConfiguration/boot.plist
    
  5. Bless the volume with new configs.
    $ sudo -n bless  --mount / --setBoot --nextonly --options "config=boot"
    
    The --nextonly flag specifies that use the boot.plist configs only for one boot. So if the kernel panic's you can easily power reboot and recover back to original kernel.

Creating tags and cscope

Set up your build environment and from the top directory, run:
$ make tags     # this will build ctags and etags on a case-sensitive volume, only ctags on case-insensitive
$ make TAGS     # this will build etags
$ make cscope   # this will build cscope database

Coding styles (Reindenting files)

Source files can be reindented using clang-format setup in .clang-format. XNU follows a variant of WebKit style for source code formatting. Please refer to format styles at WebKit website. Further options about style options is available at clang docs
Note: clang-format binary may not be part of base installation. It can be compiled from llvm clang sources and is reachable in $PATH.
From the top directory, run:
$ make reindent # reindent all source files using clang format.

How to install a new header file from XNU

To install IOKit headers, see additional comments in iokit/IOKit/Makefile.
XNU installs header files at the following locations -
a. $(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers
b. $(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders
c. $(DSTROOT)/usr/include/
d. $(DSTROOT)/System/Library/Frameworks/System.framework/PrivateHeaders
Kernel.framework is used by kernel extensions.
The System.framework and /usr/include are used by user level applications. 
The header files in framework's PrivateHeaders are only available for ** Apple Internal Development **.
The directory containing the header file should have a Makefile that creates the list of files that should be installed at different locations. If you are adding first header file in a directory, you will need to create Makefile similar to xnu/bsd/sys/Makefile.
Add your header file to the correct file list depending on where you want to install it. The default locations where the header files are installed from each file list are -
a. `DATAFILES` : To make header file available in user level -
   `$(DSTROOT)/usr/include`

b. `PRIVATE_DATAFILES` : To make header file available to Apple internal in
   user level -
   `$(DSTROOT)/System/Library/Frameworks/System.framework/PrivateHeaders`

c. `KERNELFILES` : To make header file available in kernel level -
   `$(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers`
   `$(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders`

d. `PRIVATE_KERNELFILES` : To make header file available to Apple internal
   for kernel extensions -
   `$(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders`
The Makefile combines the file lists mentioned above into different install lists which are used by build system to install the header files.
If the install list that you are interested does not exist, create it by adding the appropriate file lists. The default install lists, its member file lists and their default location are described below -
a. `INSTALL_MI_LIST` : Installs header file to a location that is available to everyone in user level.
    Locations -
       $(DSTROOT)/usr/include
   Definition -
       INSTALL_MI_LIST = ${DATAFILES}

b.  `INSTALL_MI_LCL_LIST` : Installs header file to a location that is available
   for Apple internal in user level.
   Locations -
       $(DSTROOT)/System/Library/Frameworks/System.framework/PrivateHeaders
   Definition -
       INSTALL_MI_LCL_LIST = ${PRIVATE_DATAFILES}

c. `INSTALL_KF_MI_LIST` : Installs header file to location that is available
   to everyone for kernel extensions.
   Locations -
        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers
   Definition -
        INSTALL_KF_MI_LIST = ${KERNELFILES}

d. `INSTALL_KF_MI_LCL_LIST` : Installs header file to location that is
   available for Apple internal for kernel extensions.
   Locations -
        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders
   Definition -
        INSTALL_KF_MI_LCL_LIST = ${KERNELFILES} ${PRIVATE_KERNELFILES}

e. `EXPORT_MI_LIST` : Exports header file to all of xnu (bsd/, osfmk/, etc.)
   for compilation only. Does not install anything into the SDK.
   Definition -
        EXPORT_MI_LIST = ${KERNELFILES} ${PRIVATE_KERNELFILES}
If you want to install the header file in a sub-directory of the paths described in (1), specify the directory name using two variables INSTALL_MI_DIR and EXPORT_MI_DIR as follows -
INSTALL_MI_DIR = dirname
EXPORT_MI_DIR = dirname
A single header file can exist at different locations using the steps mentioned above. However it might not be desirable to make all the code in the header file available at all the locations. For example, you want to export a function only to kernel level but not user level.
You can use C language's pre-processor directive (#ifdef, #endif, #ifndef) to control the text generated before a header file is installed. The kernel only includes the code if the conditional macro is TRUE and strips out code for FALSE conditions from the header file.
Some pre-defined macros and their descriptions are -
a. `PRIVATE` : If true, code is available to all of the xnu kernel and is
   not available in kernel extensions and user level header files.  The
   header files installed in all the paths described above in (1) will not
   have code enclosed within this macro.

b. `KERNEL_PRIVATE` : If true, code is available to all of the xnu kernel and Apple
    internal kernel extensions.

c. `BSD_KERNEL_PRIVATE` : If true, code is available to the xnu/bsd part of
   the kernel and is not available to rest of the kernel, kernel extensions
   and user level header files.  The header files installed in all the
   paths described above in (1) will not have code enclosed within this macro.

d. `KERNEL` :  If true, code is available only in kernel and kernel
   extensions and is not available in user level header files.  Only the
   header files installed in following paths will have the code -

        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/Headers
        $(DSTROOT)/System/Library/Frameworks/Kernel.framework/PrivateHeaders

   you should check [Testing the kernel][] for details.

How to add a new syscall

Testing the kernel

XNU kernel has multiple mechanisms for testing.
  • Assertions - The DEVELOPMENT and DEBUG kernel configs are compiled with assertions enabled. This allows developers to easily test invariants and conditions.
  • XNU Power On Self Tests (XNUPOST): The XNUPOST config allows for building the kernel with basic set of test functions that are run before first user space process is launched. Since XNU is hybrid between MACH and BSD, we have two locations where tests can be added.
    xnu/osfmk/tests/     # For testing mach based kernel structures and apis.
    bsd/tests/           # For testing BSD interfaces.
    
    Please follow the documentation at osfmk/tests/README.md
  • User level tests: The tools/tests/ directory holds all the tests that verify syscalls and other features of the xnu kernel. The make target xnu_tests can be used to build all the tests supported.
    $ make RC_ProjectName=xnu_tests SDKROOT=/path/to/SDK
    
    These tests are individual programs that can be run from Terminal and report tests status by means of std posix exit codes (0 -> success) and/or stdout. Please read detailed documentation in tools/tests/unit_tests/README.md

Kernel data descriptors

XNU uses different data formats for passing data in its api. The most standard way is using syscall arguments. But for complex data it often relies of sending memory saved by C structs. This packaged data transport mechanism is fragile and leads to broken interfaces between user space programs and kernel apis. libkdd directory holds user space library that can parse custom data provided by the same version of kernel. The kernel chunked data format is described in detail at libkdd/README.md.

Debugging the kernel

The xnu kernel supports debugging with a remote kernel debugging protocol (kdp). Please refer documentation at [technical note] TN2063 By default the kernel is setup to reboot on a panic. To debug a live kernel, the kdp server is setup to listen for UDP connections over ethernet. For machines without ethernet port, this behavior can be altered with use of kernel boot-args. Following are some common options.
  • debug=0x144 - setups debug variables to start kdp debugserver on panic
  • -v - print kernel logs on screen. By default XNU only shows grey screen with boot art.
  • kdp_match_name=en1 - Override default port selection for kdp. Supported for ethernet, thunderbolt and serial debugging.
To debug a panic'ed kernel, use llvm debugger (lldb) along with unstripped symbol rich kernel binary.
sh$ lldb kernel.development.unstripped
And then you can connect to panic'ed machine with kdp_remote [ip addr] or gdb_remote [hostip : port] commands.
Each kernel is packaged with kernel specific debug scripts as part of the build process. For security reasons these special commands and scripts do not get loaded automatically when lldb is connected to machine. Please add the following setting to your ~/.lldbinit if you wish to always load these macros.
settings set target.load-script-from-symbol-file true
The tools/lldbmacros directory contains the source for each of these commands. Please follow the README.md for detailed explanation of commands and their usage.