Measure Zero

在 TeXworks 中自定义代码补全

2020-01-08 | ~ | Language

因为 TeXworks 用了太多年, 不太想换 IDE, 还是继续用了.

文件地址: TeXworks 菜单栏的 “帮助” -> “TeXworks 配置与资源” -> “资源” -> “completion” 文件夹 -> “tw-latex.txt” 文件.

语法

<alias>:=<text>

The <alias>:= part can be omitted to turn the code text into its own alias. <text> must fit in a single line. Empty lines and lines starting with a % are ignored.

第一句话的意思是, 单纯写 blahblah 相当于 blahblah:=blahblah.

<text> 中连续的空格是有效的.

#RET# 表示 return, 换行.
#INS# 表示 insert, 光标会被放置在此处.
• bullet 是 placeholder, 使用 <Ctrl>+<Tab> 让光标移动到下一个占位符处.

A Wrong Way to Do Cross-Validation

2019-12-12 | ~ | Machine Learning

While this point may seem obvious to the reader, we have seen this blunder committed many times in published papers in top rank journals.

Consider a classification problem with a large number of predictors, as may arise, for example, in genomic or proteomic applications. A typical strategy for analysis might be as follows:

Screen the predictors: find a subset of “good” predictors that show fairly strong (univariate) correlation with the class labels.
Using just this subset of predictors, build a multivariate classifier.
Use cross-validation to estimate the unknown tuning parameters and to estimate the prediction error of the final model.

Side Note: Information Entropy, Cross-Entropy and KL Divergence

2019-11-30 | ~ | Mathematics

我们考虑一个事件 $A$, 它发生的概率是 $p$. 假设我们观测到事件 $A$ 发生, 我们希望定义一个信息量 $I(p)$ 来衡量 “$A$ 发生了” 这件事给了我们多少信息.

$I(p)$ 是关于 $p$ 的递减函数. 如果事件发生概率高, 而且它发生了, 我们得到的信息应该比较少, 因为我们认为它确实容易发生, 这不稀奇.
考虑另一个独立的事件 $B$, 它发生的概率是 $q$, 则 $I(pq) = I(p) + I(q)$. 也就是说我们希望独立事件同时发生时提供的信息量应该是他们分别提供的信息量之和.

一次手磨咖啡体验

2019-11-21 | ~ 2020-12-03 | Food and Cooking

上周六 (2019/11/16) 在学校北区咖啡馆体验了一次手磨咖啡, 简单地记录一下. 也叫手冲咖啡.

1. 制作手磨咖啡的流程

主办方提供的器材如图

Bootstrap 失效的一个例子

2019-11-08 | ~ | Statistics

假设 $Y_1, \dots, Y_n$ 独立同分布, 服从 $[0,\theta]$ 上的均匀分布. 则其似然函数为

\[L(\theta|Y_1, \dots, Y_n) = \frac{1}{\theta^n} \prod_{k=1}^n 1_{\{ 0\le Y_k\le \theta \}}.\]

中位数两则, 线性时间与 leetcode 4

2019-10-21 | ~ 2020-06-07 | Algorithms

找中位数最暴力的方法是先排序再取中位数, 时间复杂度 $O(n\log n)$. 后来才得知中位数有时间复杂度 $O(n)$ 的算法, 事实上任意顺序统计量都可以用 $O(n)$ 时间找出.

Lights-Out

2019-09-14 | ~ | Mathematics

Each employee of MegaCorp has a separate office in the MegaCorp office building. Each office is equipped with one overhead light and one toggle switch to turn the light on and off.

Every day, the employees turn on all lights when they come to work. Each evening they turn off all lights when they go home.

One day, the employees arrive to discover that someone has played a rather elaborate hoax on them. Though all looks fine when they come in (all lights are off), every time an employee flicks the switch in her office, this not only toggles the light in her office, but also the lights in the offices of all of her friends. (Friendship is a symmetric relationship.)

The question: does there necessarily exist an arrangement of the switches that will turn all lights simultaneously on (so that work can begin)? Prove your answer.

Super Egg Drop

2019-09-13 | ~ | Mathematics

You are given $k$ eggs, and you have access to a building with $N$ floors from $1$ to $N$.

Each egg is identical in function, and if an egg breaks, you cannot drop it again.

You know that there exists a floor $F$ with $0 \le F \le N$ such that any egg dropped at a floor higher than $F$ will break, and any egg dropped at or below floor $F$ will not break.

Each move, you may take an egg (if you have an unbroken one) and drop it from any floor $X$ (with $1 \le X \le N$).

Your goal is to know with certainty what the value of $F$ is.

What is the minimum number of moves that you need to know with certainty what $F$ is, regardless of the initial value of $F$?