Blog

[Houdini] [UE] 甜甜圈效果

最终效果:

https://vimeo.com/820751652?share=copy

准备素材

首先甜甜圈是根据一个圆环的法向量分布的一片一片的方块, 所以我的思路是首先从一个圆环获得各个定点然后在每个定点处生成一个对应的长方体

但是每一个定点的方体再被复制过去的时候会根据normal旋转, 所以需要定义每个定点normal的up vector. 除此之外, 长方形的宽高需要根据圆环的大小来改变, 所以我在这边找到每个定点最近点的距离来计算每个定点应该对应的短边.

然后根据pivotpainter的文档, 准备好的素材每个geometry对应需要一个point, point和geometry需要有一个共同的attribute叫做name, 这样才能知道哪个pivot对应哪一个geometry. 接下来就是在每个点创建好name属性之后一次复制给顶点上的geometry.

顶点动画

首先需要区分开这个物体外侧的颜色和内部的颜色, 这一个区分可以通过看每个面的normal和物体pivotpoint的normal是否是相同方向(同方向是外面,反方向是里面). 这里可以通过dot product实现: a dot b > 0: two vectors are pointing the same way, < 0 then opposite.

其次就是关于顶点动画的解析: 这边每个平面是根据它的normal方向在旋转, 类似于一根筷子插在了一块豆腐上然后在转这个筷子, 但是豆腐本身没有在旋转. 所以这里可以用旋转normal的方式来达到表面旋转的效果.

想要让物体normal先有一些偏移, 然后在随着时间沿着normal方向画圆运动. 这里首先就要获得两个垂直于normal的向量a, b, 计算出圆周运动offset为sin(time) * a + cos(time) * b, 然后让axis等于原本normal加上这个新的offset就可以.

就是: f(time) => axis = normalize(normal + sin(time) * a + cos(time) * b)

所以这里先用cross product找到这两个向量a, b然后再根据时间计算即可. 不过为了让物体的每一层有一些不一样的的旋转样子, 可以用pivotpoint的z值修改time的offset从而让整体看起来从下到上依次旋转.

最后是根据相机距离pivotpoint的距离来放大缩小所有的物体. 这边的思路就是使用一个0-1的值来代表物体整体的缩放比例, 0的时候物体集中在pivotpoint, 1的时候所有顶点都在原地. 这里就需要用到从每个顶点到pivotpoint的向量v, 所有顶点在变化的同时也要同时也要加上v * normalized_camera_distance(0~1) 这个值. 但是有问题是因为之前做的动画导致了顶点可能不在原点, 这样缩小到0的时候可能会留一小块消不掉. 所以normalized_camera_distance为0的时候也同时要把其他参数也都归零(rotation_speed, z_bias, rotation_angle). 这样才能让每个方块正着消失.

最终的material:

一些整活

在做的过程中出现了很多奇奇怪怪的bug, 但是这个看起来很好玩所以就记录一下

附上material. 我认为应该是 *1.1那个地方把旋转中心整体偏移了, 然后就做成了打散-构成的一种好玩的效果. 如果调的更高就有点飞散重构的效果了.

references:

https://docs.unrealengine.com/4.27/zh-CN/Resources/SampleGames/

https://docs.unrealengine.com/4.26/en-US/AnimatingObjects/PivotPainter/PivotPainter1/

First Try of PBR Shader and Effects

This is a sample unity PBR shader with the demo of rim color and dissolve effects.

Standard PBR shader

This standard shader uses Unity URP to group into the standard PBR shader with main texture, normal map, metallic map, emission, and ambient occlusion.

Rim Effects

The rim effect is accomplished by the fresnel node: the part close to the edge of the model would return a value of 1 as the inside would return the value of 0, the power controls how much the transition takes place so we can use that as the controller of the intensity of the rim color.

Dissolve Effects

The dissolve effects is achieved by applying an alpha mask and control the clipping value by the alpha test – places lower than the alpha test would not be rendered. the dissolved edge color is achieved by applying an intensified noise map (e.g. noise adds a constant) and adding that to the emissive color. therefore we can handle the fresnel and the dissolve emissive color separately.

Report of OCR Spell Check Independent Study

Introduction

What is OCR:

OCR stands for optical character recognition, which is basically the technique to convert a pixelated image into words and characters for better readability, and interpretability. Some ancient books are only available in scanned versions so some words may be hard to see, as well as they are hard to preserve digitized. In this independent study with prof. Naiman Jill, we explored different ways to examine how to increase the readability and correctness of OCR data so they can read better to humans. An example of an OCR’d page is in the figure above.

Major current issues we found with OCR data are missing characters, punctuations and line feeds, and grammar errors. For the OCR engine, we choose the google OCR engineTesseract, as implemented with the Python wrapper named “pytesseract” (https://pypi.org/project/pytesseract/ )

Our examination and exploration extended through different methods and modules such as using a spell checker (https://github.com/filyp/autocorrect) , using GROBID data format to parse PDF,  use metric measurements to create our own dictionary about the topic, and ways to quantify word error rate (WER, which means how many words are parsed error from ground truth)  and character error rate (CER, which means how many character are parsed wrong). 

Our goal is to digitize and textify the old scanned paper, and probably to host figures + captions at AIE (http://www.astroexplorer.org/).

Methods

The first method we tried was to do a grammar check on text with OCR with package autocorrect (https://github.com/filyp/autocorrect) . Because OCR text generally have some small spelling errors or characters are missing, the spell checker can often detect those spelling errors and fix them with most common phrases and expressions in English. However, sometimes the error is not so obvious and many words contain many letters wrong in a single word. Also, a spell checker can sometimes change the meaning of a sentence which is not okay for scientific writing. 

Our second approach is to make some GROBID data so the output is more structured with metadata. GROBID is a shortened version of GeneRation Of BIbliographic Data, which is an algorithm that parses vector PDF documents and saves the outputs as  a special notation in data format similar to TEI/XML format. 

OCR’d pages sometimes have hyphens at the end of a line for a word that needs to be “wrapped”, but GROBID parsed format can have the whole word together. Spell checkers sometimes regard that as spelling errors and try to fix that, but with our metadata information, we can concatenate two parts together. 

The other approach is to make a personal dictionary about the frequent words in astronomy data. Here we used the python module “pyenchant” (the Python wrapper for C enchant)  with its function PyPWL, which is a module for us to build our own dictionary using the GROBID-parsed data from vector PDFs. 

We had piped in tokenized words selected from other similar scientific documents to build our dictionary as produced by GROBID. The first approach used about a million words (with appox. 100 thousand words deleting repeated words with different tense, and some non-alphabetic chracters). The dictionary module was built relatively fast which surprised me, and the word suggestion method is better towards the scientific realm in contrast with our first spell checker method, however, the module tends to execute slow exponentially if we pipe in longer words. To our rough estimation, it takes less than one second to process common words less than 4-5 letters, while the module executes longer and predicts worse for longer words (10-20 seconds for 10 letter words). 

With  further research, the underlying algorithm of this module “enchant” personal dictionary is based on word edit distance (levenshitein distance), which depicts the minimum number ofedits are needed to edit a word to another word. This may explain why this dictionary runs longer for bigger words. 

The next task is about using HathiTrust vs. Gutenberg dataset to quantify error rates (A Prototype Gutenberg-HathiTrust Sentence-level Parallel Corpus for OCR Error Analysis: Pilot Investigations, Ming). This approach helped us to quantify how well OCR transfers information. In this approach, we get the text from  the same document both in HaitiTrust (images, PDFs, OCRed data) and Gutenberg (human typed in the ground truth). We have tried both word error rate and character error rate on these different datasets. 

The outcome of this is we discovered the OCR data has a high chance of having a character-level error rate (CER). 

Our last approach, which we did not quite finish, was to use a python module written in C named ocr-post-correction for training and testing models for spell and grammar checks (https://github.com/shrutirij/ocr-post-correction) . As the procedure, we split the document into train sets and test sets. As this is supervised machine learning, we used the dataset with the HathiTrust part as train input data, while we use Gutenberg as the train output data. Then we intended to test some of the HathiTrust data to see how well this method runs. Unfortunately, we were not able to implement the full training procedure as GPU training was not fully supported with the current codebase on Google Colab. . 

Conclusion and Future Works

With our semester’s individual research about OCR and the method of post-OCR data, we have interacted with different types of correctors modules in Python for minimizing error in OCRed text. These methods are helpful in making OCR data more correct, while they all have some ways to improve. So to digitize a larger corpus of documentation, we still need to work on implementing different deep learning models and methods for OCR correction.