Python Notes

Must Watch!



MustWatch

Scheduling Tasks in Python javatpoint R Tutorial Modern Graphical User Interfaces in Python code 3D Engine in Python. OpenGL Pygame Tutorial code 3D Engine in Python from Scratch pythonTutorial python-Tutorial 超强的AI作画
C:\Users\william\AppData\Local\Programs\Python\Python310\python.exe
C:\ai

https://www.freedidi.com/6727.html

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies

PyMuPDF 梯度下降的原理、代码、调试 掌握自动机器学习 AutoML:PyCaret 一行代码就能识别 爬虫方法 Python 破解 WiFi 密码 Altair: Declarative Visualization Python 数据可视化的 3 大步骤 thonny Python IDE Python Website Full Tutorial - Flask pyecharts Building Command Line Interfaces with Python Generator Functions 40 Python charts Python library for rich text in the terminal: Rich Pandas 核心操作 A set of Data analysis tools filebrowser provides a file managing interface Top 47 Machine Learning Projects 简单实用的 pandas 技巧:如何将内存占用降低90% Pandas 75 个高频操作 Python 数据处理全家桶 code used for Automating a Supply Chain with Machine Learning, AWS, and Python automate supply chain Tech With Tim to Convert any Python File to .EXE Python compiler into exe Writing Python Extensions in Assembly samples
Python Website Full Tutorial - Flask, Authentication, Databases & More pythonVideos Tech With Tim Python for Everybody pythonTut Python Urllib GET Requests
Spotting and solving everyday problems with machine learning recordlinkage
Code with Mu Thonny Python IDE CircuitPython codespeedy python samples Python tutorial and other tutorials
Python Search Database Table Trinket run and write code in browser python fu gimp scripting tutorial Python Blender tutorial Inkscape PythonEffect Tutorial Python data analysis tutorials


Major Contents

Python Object Oriented Programming (OOP) Python 100 Days Learn Python by Building Five Games - Full Course Python Tutorial for Beginners Build a Game with Python Python OOP教程1:类和实例 ♦python programming examples Python Examples . Python的GUI编程 . 阮一峰的网络 . get webpage with python Python Programming Tutorials . SymPy Geometry Module python samples . ♦python print . ♦Python Data Science Handbook . Python NLP入门教程 feature engineering for nlp in python . ♦python 爬虫技巧 ♦Python Numpy Tutorial ♦Python Numpy Tutorial1 . ♦Python数据分析 Numpy . ♦Numpy . ♦Numpy reshape CodeSkulptor . Python & pip Windows installation . 快速学习Python的技巧 Python Classes and Objects Python Tutorial . ♦pythonTutorial . ♦pythonTutorial cards . ♦pythonspotTutor . Python ♦python Reference . ♦python Mathematical functions . Python 2.7 Quick Reference ♦Pythontips Python Tips ♦PythonLibHints Tips . ♦Python Built-in Functions Tips . ♦python Input and Output . python print Python Quick Reference Card . ♦python Mathematical functionsList . ♦Datacamp Tutorials ♦programizList Python Tutorial . ♦Python Libraries . Python数据分析库Pandas . ♦Python Quick ReferenceScript ♦Python Quick ReferenceList . ♦Python 3 – Quick Reference Card . ♦Python Built-in Functions ♦Python NLP入门教程 . ♦Altair . Python数据分析 linear-optimization-in-python Creating and Viewing HTML Files with Python . 掌握 Python MyPy Python App Tracks Amazon Prices! PyQt5 Tutorial - Setup and a Basic GUI Application Python GUI's with PyQt5 PyQt5 Tutorial - Use Qt Designer code and a written tutorial: https://nitratine.net/blog/post/python-guis-with-pyqt/ Auto Py to Exe: https://youtu.be/OZSZHmWSOeM Convert PY to EXE: https://youtu.be/lOIJIk_maO4 PyQt5 and pyqt5-tools support Python 3.5 - 3.7. designer.exe has now been moved to pyqt5_tools\Qt\bin but can also be found in the \Scripts\ folder in the root of your Python installation. Installing scipy . SciPy Tutorial . Basic functions . ♦Using Python for Android and QPython Python Regex Cheatsheet . ♦30 Python Projects . Make the Most of Python’s Communities Python开发者 . Python编程 . Pythoner每日一报 . Python小屋 . 人人可以学Python 机器学习算法与Python学习 . Python实现机器学习算法 Introducing a simple and intuitive Python API for UCI machine learning repository 7 methods to perform Time Series forecasting (with Python codes) pyscripter home . pyscripter.pdf . pyscript intro . pyscript.pdf . pyscript gallery Python Tutorial: Automate Parsing and Renaming of Multiple Files Python Tutorial for Beginners 1: Install and Setup for Mac and Windows Turning Sublime Text Into a Lightweight Python IDE Lesson 1 - Python Programming (Automate the Boring Stuff with Python) Task Automation with Python Scripting 6 种 Python 数据可视化工具 从入门到上手写脚本/爬数据/搭网站,有哪些快速学习Python的技巧 Scikit-learn中文文档:步入机器学习的完美实践教程 94页论文综述卷积神经网络:从基础技术到研究前景 介绍机器学习中基本的数学符号 Deep Learning from first principles in Python, R and Octave – Part 8 A weird introduction to Deep Learning Google’s AutoML will change how businesses use Machine Learning Python: party with Strings How to Read Outlook Emails by Python trained a language detection AI in 20 minutes Logistic Regression Algorithm Renko brick size optimization DNS 解析器性能比较:Google、CloudFlare、Quad9 . 直白介绍卷积神经网络(CNN) Python Tutorial: Using Try/Except Blocks for Error Handling intro-to-python-for-data-science datacamp onboarding data-scientist-with-python building-chatbots-in-python ♦Data Structures in Python R vs Python – a One-on-One Comparison What is the difference between Python and R language? Introduction to Data Processing with Python Classes and Methods In Python Data Science for Startups: R -> Python A Complete Machine Learning Project Walk-Through in Python: Part One Top 10 Python Programming Tricks turtle graphic Fuzzy String Matching in Python python 10 tips ♦Python 3 入门 ♦Setting Up Sublime Text 3 for Python Development Convert PY to EXE python speech recognition Speech Recognition Python face recognition python gradebook exercise
python battleship exercise
python Bitwise operations exercise
python dictionary exercise
python exercise
python iterations exercise
python list exercise
python List of list exercise
python looping exercise
python practise exercise
python stat exercise Everything About Python — Beginner To Advanced 5 Python Exercises How To Build Future Proof Applications? Programming Python Using Design Patterns
Topic Modeling

Automating My Projects With Python

Automating My Projects With Python

Classes and Objects Tutorial

Classes and Objects Tutorial

Python data types:

int, or integer: a number without a fractional part. float, or floating point: a number that has both an integer and fractional part, separated by a point. factor, with the value 1.10, is an example of a float. str, or string: a type to represent text. You can use single or double quotes to build a string. bool, or boolean: a type to represent logical values. Can only be True or False (the capitalization is important!). type() function To determine the type of a, simply execute: type(a) Using the + operator to paste together two strings. print("I started with $" + savings + " and now have $" + result + ". Awesome!") This will not work, though, as you cannot simply sum strings and floats. To fix the error, you'll need to explicitly convert the types of your variables. More specifically, you'll need str(), to convert a value into a string. str(savings), for example, will convert the float savings to a string. Similar functions such as int(), float() and bool() will help you convert Python values into any type. Using the + operator to paste together two strings. print("I started with $" + savings + " and now have $" + result + ". Awesome!") This will not work, though, as you cannot simply sum strings and floats. To fix the error, you'll need to explicitly convert the types of your variables. More specifically, you'll need str(), to convert a value into a string. str(savings), for example, will convert the float savings to a string. Similar functions such as int(), float() and bool() will help you convert Python values into any type. Manipulating lists .append() method .extend() method on the list .index() method .pop() method aList = [123, 'xyz', 'zara', 'abc']; aList.append( 2009 ); print("Updated List : ", aList) # Create a list containing the names: baby_names baby_names = ['Ximena', 'Aliza', 'Ayden', 'Calvin'] # Extend baby_names with 'Rowen' and 'Sandeep' baby_names.extend(['Rowen', 'Sandeep']) # Print(baby_names) print(baby_names) # Find the position of 'Aliza': position position = baby_names.index('Aliza') # Remove 'Aliza' from baby_names baby_names.pop(position) # Print(baby_names) print(baby_names) Looping over lists for loop sorted() The sorted() function returns a new list and does not affect the list you passed into the function. A a list of this form: ['2011', 'FEMALE', 'HISPANIC', 'GERALDINE', '13', '75'] loop over this list of lists and append the names of each baby to a new list called baby_names. baby_names =[] for x in range(0, 3): print("We're on time %d" % (x)) fruits = ['banana', 'apple', 'mango'] for fruit in fruits: print('Current fruit :', fruit) extract-first-item-of-each-sublist-in-python # Create the empty list: baby_names baby_names =[] # Loop over records for baby in records: # Add the name to the list baby_names.append(baby[3]) # Sort the names in alphabetical order for name in sorted(baby_names): # Print(each name) print(name) Tuples are fixed size in nature whereas lists are dynamic. In other words, a tuple is immutable whereas a list is mutable. Using and unpacking tuples Tuples are made of several items just like a list, but they cannot be modified in any way. It is very common for tuples to be used to represent data from a database. If you have a tuple like ('chocolate chip cookies', 15) and you want to access each part of the data, you can use an index just like a list. However, you can also "unpack" the tuple into multiple variables such as type, count = ('chocolate chip cookies', 15) that will set type to 'chocolate chip cookies' and count to 15. Often you'll want to pair up multiple array data types. The zip() function does just that. It will return a list of tuples containing one element from each list passed into zip(). When looping over a list, you can also track your position in the list by using the enumerate() function. The function returns the index of the list item you are currently on in the list and the list item itself. You'll practice using the enumerate() and zip() functions in this exercise, in which your job is to pair up the most common boy and girl names. Two lists - girl_names and boy_names - have been pre-loaded into your workspace. Instructions Use the zip() function to pair up girl_names and boy_names into a variable called pairs. Use a for loop to loop through pairs, using enumerate() to keep track of your position. Unpack pairs into the variables idx and pair. Inside the for loop: Unpack pair into the variables girl_name and boy_name. Print(the rank, girl name, and boy name, in that order. The rank is contained )in idx. ========================== get webpage contents with python Urllib - GET Requests import urllib2 response = urllib2.urlopen('http://python.org/') html = response.read() import urllib2 resp = urllib2.urlopen('http://hiscore.runescape.com/index_lite.ws?player=zezima') page = resp.read() using Python 3.1 urllib.request.urlopen('http://www.python.org/') data-types-for-data-science ==========

import

import sys print(sys.path) import sys for pth in sys.path: print(pth) import os os.getcwd() os.chdir("/tmp/") os.getcwd() 在 Python 中,import、import as 與 from import 可以出現在程式中可出現的任何位置 import xmath print((xmath.max(10, 5))) print((xmath.sum(1, 2, 3, 4, 5)))   import xmath as math # 為 xmath 模組取別名為 math print((math.e))   from xmath import min  # 將 min 複製至目前模組,不建議 from modu import *,易造成名稱衝突 print((min(10, 5))) ==========

import os

os.system("your command") print("\n") import os os.system("dir c:\\") ==========

print(statement without newline or space

) In Python 3, the print(statement print(without newline or space:) print(str, end='')

Samples

max1 = a if a > b else b def max(a, b): return a if a > b else b

def factorial(n):

if n == 0: return 1 else: return n * factorial(n-1)

import math

math.factorial(n)

Python Code Examples

You can find more Python code examples at the bottom of this page. Using pywhois Magic 8-ball CommandLineFu with Python Port scanner in Python Google Command Line Script Date and Time Script Bitly Shortener with Python Sending Mails using Gmail Command Line speedtest.net via tespeed Search computer for specific files Get the Geo Location of an IP Address Get the username from a prompts Tweet Search using Python Date and Time in Python Python Turtle Racing! Python Game : Rolling the dice Monitor Apache / Nginx Log File Log Checker in Python Python : Guessing Game part 2 Guessing Game written in Python Python Password Generator Convert KM/H to MPH Get all the links from a website Celsius and Fahrenheit Converter Calculate the average score Check your external IP address Python Hangman Game Python Command Line IMDB Scraper

Python code examples

Here we link to other sites that provides Python code examples. ActiveState Code - Popular Python recipes Snipplr.com Nullege - Search engine for Python source code Snipt.net

Python Programming Examples

Recent Articles on Python !
Python Output & Multiple Choice Questions
 
Topics :

Basic Programs:

add two numbers factorial of a number simple interest compound interest check Armstrong Number Program to find area of a circle print(all Prime )numbers in an Interval check whether a number is Prime or not n-th Fibonacci number Fibonacci numbers How to check if a given number is Fibonacci number? n\’th multiple of a number in Fibonacci Series Program to print(ASCII Value of a character) Sum of squares of first n natural numbers cube sum of first n natural numbers Array Programs:

find sum of array find largest element in an array array rotation Reversal algorithm for array rotation Split the array and add the first part to the end Find reminder of array multiplication divided by n Reconstruct the array by replacing arr[i] with (arr[i-1]+1) % M check if given array is Monotonic List Programs:

interchange first and last elements in a list swap two elements in a list remove Nth occurrence of the given word Python | Ways to find length of list Python | Ways to check if element exists in list Different ways to clear a list in Python Python | Reversing a List Python | Cloning or Copying a list Python | Count occurrences of an element in a list find sum of elements in list Python | Multiply all numbers in the list find smallest number in a list find largest number in a list find second largest number in a list find N largest elements from a list print(even numbers in a list) print(odd numbers in a List) print(all even numbers )in a range print(all odd numbers in )a range count Even and Odd numbers in a List print(positive numbers in )a list print(negative numbers in )a list print(all positive )numbers in a range print(all negative )numbers in a range count positive and negative numbers in a list Remove multiple elements from a list in Python Python | Remove empty tuples from a list Python | Program to print(duplicates from a list of integers) find Cumulative sum of a list Break a list into chunks of size N in Python Python | Sort the values of first list using second list More >> String Programs:

check if a string is palindrome or not Reverse words in a given String in Python Ways to remove i’th character from string in Python Python | Check if a Substring is Present in a Given String Find length of a string in python (4 ways) print(even length )words in a string Python | Program to accept the strings which contains all vowels Python | Count the Number of matching characters in a pair of string count number of vowels using sets in given string Remove all duplicates from a given string in Python Python | Program to check if a string contains any special character Generating random strings until a given string is generated Find words which are greater than given length k removing i-th character from a string split and join a string Python | Check if a given string is binary string or not Python | Find all close matches of input string from a list find uncommon words from two Strings Python | Swap commas and dots in a String Python | Permutation of a given string using inbuilt function Python | Check for URL in a String Execute a String of Code in Python String slicing in Python to rotate a string String slicing in Python to check if a string can become empty by recursive deletion Python Counter| Find all duplicate characters in string More >> Dictionary Programs:

Python | Sort Python Dictionaries by Key or Value Handling missing keys in Python dictionaries Python dictionary with keys having multiple inputs find the sum of all items in a dictionary Python | Ways to remove a key from dictionary Ways to sort list of dictionaries by values in Python – Using itemgetter Ways to sort list of dictionaries by values in Python – Using lambda function Python | Merging two Dictionaries Program to create grade calculator in Python Python | Check order of character in string using OrderedDict( ) Python | Find common elements in three sorted arrays by dictionary intersection Dictionary and counter in Python to find winner of election Find all duplicate characters in string Print(anagrams together )in Python using List and Dictionary Check if binary representations of two numbers are anagram Python Counter to find the size of largest subset of anagram words Python | Remove all duplicates words from a given sentence Python Dictionary to find mirror characters in a string Counting the frequencies in a list using dictionary in Python Python | Convert a list of Tuples into Dictionary Python counter and dictionary intersection example (Make a string using deletion and rearrangement) Python dictionary, set and counter to check if frequencies can become same Scraping And Finding Ordered Words In A Dictionary using Python Possible Words using given characters in Python More >> Tuple Programs:

Create a list of tuples from given list having number and its cube in each tuple Sort a list of tuples by second Item More >> Searching and Sorting Programs:

Binary Search (Recursive and Iterative) Linear Search Insertion Sort Recursive Insertion Sort QuickSort Iterative Quick Sort Selection Sort Bubble Sort Merge Sort Iterative Merge Sort Heap Sort Counting Sort ShellSort Topological Sorting Radix Sort Binary Insertion Sort Bitonic Sort Comb Sort Pigeonhole Sort Cocktail Sort Gnome Sort Odd-Even Sort / Brick Sort BogoSort or Permutation Sort Cycle Sort Stooge Sort Pattern Printing Programs:

Program to print(the pattern ‘G’) Python | Print(an Inverted Star Pattern) Python 3 | Program to print(double sided stair-case pattern) Print(with )your own font using Python !! Date-Time Programs:

convert time from 12 hour to 24 hour format More Python Programs:

Reverse a linked list Find largest prime factor of a number Efficient program to print(all )prime factors of a given number Product of unique prime factors of a number Find sum of odd factors of a number Coin Change Tower of Hanoi Sieve of Eratosthenes Check if binary representation is palindrome Basic Euclidean algorithms Extended Euclidean algorithms Number of elements with odd factors in given range Common Divisors of Two Numbers Maximum height when coins are arranged in a triangle GCD of more than two (or array) numbers Check if count of divisors is even or odd Find minimum sum of factors of number Difference between sums of odd and even digits Program to Print(Matrix )in Z form Largest K digit number divisible by X Smallest K digit number divisible by X Print(Number )series without using any loop Number of stopping station problem Program to calculate area of a Tetrahedron focal length of a spherical mirror Find the perimeter of a cylinder Check if a triangle of positive area is possible with the given angles Number of jump required of given length to reach a point of form (d, 0) from origin in 2D plane Finding the vertex, focus and directrix of a parabola find the most occurring character and its count Find sum of even factors of a number Check if all digits of a number divide it convert float decimal to Octal number convert floating to binary Check whether a number has consecutive 0’s in the given base or not Number of solutions to Modular Equations Triangular Matchstick Number Legendre\’s Conjecture check if a string contains all unique characters copy odd lines of one file to other

some samples

Please note that these examples are written in Python 2, and may need some adjustment to run under Python 3. 1 line: Output print('Hello, world!')
2 lines: Input, assignment name = raw_input('What is your name?\n') print('Hi, %s.' % name)
3 lines: For loop, built-in enumerate function, new style formatting friends = ['john', 'pat', 'gary', 'michael'] for i, name in enumerate(friends): print("iteration {iteration} is {name}".format(iteration=i, )name=name)
4 lines: Fibonacci, tuple assignment parents, babies = (1, 1) while babies < 100: print('This generation has {0} babies'.format(babies)) parents, babies = (babies, parents + babies)
5 lines: Functions def greet(name): print('Hello', name) greet('Jack') greet('Jill') greet('Bob')
6 lines: Import, regular expressions import re for test_string in ['555-1212', 'ILL-EGAL']: if re.match(r'^\d{3}-\d{4}$', test_string): print(test_string, 'is a valid US local phone number') else: print(test_string, 'rejected')
7 lines: Dictionaries, generator expressions prices = {'apple': 0.40, 'banana': 0.50} my_purchase = { 'apple': 1, 'banana': 6} grocery_bill = sum(prices[fruit] * my_purchase[fruit] for fruit in my_purchase) print('I owe the grocer $%.2f' % grocery_bill)
8 lines: Command line arguments, exception handling # This program adds up integers in the command line import sys try: total = sum(int(arg) for arg in sys.argv[1:]) print('sum =', total) except ValueError: print('Please supply integer arguments')
9 lines: Opening files # indent your Python code to put into an email import glob # glob supports Unix style pathname extensions python_files = glob.glob('*.py') for file_name in sorted(python_files): print(' ------' + file_name) with open(file_name) as f: for line in f: print(' ' + line.rstrip()) print
10 lines: Time, conditionals, from..import, for..else from time import localtime activities = {8: 'Sleeping', 9: 'Commuting', 17: 'Working', 18: 'Commuting', 20: 'Eating', 22: 'Resting' } time_now = localtime() hour = time_now.tm_hour for activity_time in sorted(activities.keys()): if hour < activity_time: print(activities[activity_time]) break else: print('Unknown, AFK or sleeping!')
11 lines: Triple-quoted strings, while loop REFRAIN = ''' %d bottles of beer on the wall, %d bottles of beer, take one down, pass it around, %d bottles of beer on the wall! ''' bottles_of_beer = 99 while bottles_of_beer > 1: print(REFRAIN % (bottles_of_beer, bottles_of_beer,) bottles_of_beer - 1) bottles_of_beer -= 1
12 lines: Classes class BankAccount(object): def __init__(self, initial_balance=0): self.balance = initial_balance def deposit(self, amount): self.balance += amount def withdraw(self, amount): self.balance -= amount def overdrawn(self): return self.balance < 0 my_account = BankAccount(15) my_account.withdraw(5) print(my_account.balance)
13 lines: Unit testing with unittest import unittest def median(pool): copy = sorted(pool) size = len(copy) if size % 2 == 1: return copy[(size - 1) / 2] else: return (copy[size/2 - 1] + copy[size/2]) / 2 class TestMedian(unittest.TestCase): def testMedian(self): self.failUnlessEqual(median([2, 9, 9, 7, 9, 2, 4, 5, 8]), 7) if __name__ == '__main__': unittest.main()
14 lines: Doctest-based testing def median(pool): '''Statistical median to demonstrate doctest. >>> median([2, 9, 9, 7, 9, 2, 4, 5, 8]) 7 ''' copy = sorted(pool) size = len(copy) if size % 2 == 1: return copy[(size - 1) / 2] else: return (copy[size/2 - 1] + copy[size/2]) / 2 if __name__ == '__main__': import doctest doctest.testmod()
15 lines: itertools from itertools import groupby lines = ''' This is the first paragraph. This is the second. '''.splitlines() # Use itertools.groupby and bool to return groups of # consecutive lines that either have content or don't. for has_chars, frags in groupby(lines, bool): if has_chars: print(' '.join(frags)) # PRINTS: # This is the first paragraph. # This is the second.
16 lines: csv module, tuple unpacking, cmp() built-in import csv # write stocks data as comma-separated values writer = csv.writer(open('stocks.csv', 'wb', buffering=0)) writer.writerows([ ('GOOG', 'Google, Inc.', 505.24, 0.47, 0.09), ('YHOO', 'Yahoo! Inc.', 27.38, 0.33, 1.22), ('CNET', 'CNET Networks, Inc.', 8.62, -0.13, -1.49) ]) # read stocks data, print(status messages) stocks = csv.reader(open('stocks.csv', 'rb')) status_labels = {-1: 'down', 0: 'unchanged', 1: 'up'} for ticker, name, price, change, pct in stocks: status = status_labels[cmp(float(change), 0.0)] print('%s is %s (%s%%)' % (name, status, pct))
18 lines: 8-Queens Problem (recursion) BOARD_SIZE = 8 def under_attack(col, queens): left = right = col for r, c in reversed(queens): left, right = left - 1, right + 1 if c in (left, col, right): return True return False def solve(n): if n == 0: return [[]] smaller_solutions = solve(n - 1) return [solution+[(n,i+1)] for i in xrange(BOARD_SIZE) for solution in smaller_solutions if not under_attack(i+1, solution)] for answer in solve(BOARD_SIZE): print(answer)
20 lines: Prime numbers sieve w/fancy generators import itertools def iter_primes(): # an iterator of all numbers between 2 and +infinity numbers = itertools.count(2) # generate primes forever while True: # get the first number from the iterator (always a prime) prime = numbers.next() yield prime # this code iteratively builds up a chain of # filters...slightly tricky, but ponder it a bit numbers = itertools.ifilter(prime.__rmod__, numbers) for p in iter_primes(): if p > 1000: break print(p)
21 lines: XML/HTML parsing (using Python 2.5 or third-party library) dinner_recipe = '''<html><body><table> <tr><th>amt</th><th>unit</th><th>item</th></tr> <tr><td>24</td><td>slices</td><td>baguette</td></tr> <tr><td>2+</td><td>tbsp</td><td>olive oil</td></tr> <tr><td>1</td><td>cup</td><td>tomatoes</td></tr> <tr><td>1</td><td>jar</td><td>pesto</td></tr> </table></body></html>''' # In Python 2.5 or from http://effbot.org/zone/element-index.htm import xml.etree.ElementTree as etree tree = etree.fromstring(dinner_recipe) # For invalid HTML use http://effbot.org/zone/element-soup.htm # import ElementSoup, StringIO # tree = ElementSoup.parse(StringIO.StringIO(dinner_recipe)) pantry = set(['olive oil', 'pesto']) for ingredient in tree.getiterator('tr'): amt, unit, item = ingredient if item.tag == "td" and item.text not in pantry: print("%s: %s %s" % (item.text, amt.text, unit.text))
28 lines: 8-Queens Problem (define your own exceptions) BOARD_SIZE = 8 class BailOut(Exception): pass def validate(queens): left = right = col = queens[-1] for r in reversed(queens[:-1]): left, right = left-1, right+1 if r in (left, col, right): raise BailOut def add_queen(queens): for i in range(BOARD_SIZE): test_queens = queens + [i] try: validate(test_queens) if len(test_queens) == BOARD_SIZE: return test_queens else: return add_queen(test_queens) except BailOut: pass raise BailOut queens = add_queen([]) print(queens) print("\n".join(". "*q + "Q " + ". "*()BOARD_SIZE-q-1) for q in queens)
33 lines: "Guess the Number" Game (edited) from http://inventwithpython.com import random guesses_made = 0 name = raw_input('Hello! What is your name?\n') number = random.randint(1, 20) print('Well, {0}, I am thinking of a number between 1 and 20.'.format(name)) while guesses_made < 6: guess = int(raw_input('Take a guess: ')) guesses_made += 1 if guess < number: print('Your guess is too low.') if guess > number: print('Your guess is too high.') if guess == number: break if guess == number: print('Good job, {0}! You guessed my number in {1} guesses!'.format(name, )guesses_made) else: print('Nope. The number I was thinking of was {0}'.format(number))

Functions in Python Math Module

List of Functions in Python Math Module
Function Description
ceil(x) Returns the smallest integer greater than or equal to x.
copysign(x, y) Returns x with the sign of y
fabs(x) Returns the absolute value of x
factorial(x) Returns the factorial of x
floor(x) Returns the largest integer less than or equal to x
fmod(x, y) Returns the remainder when x is divided by y
frexp(x) Returns the mantissa and exponent of x as the pair (m, e)
fsum(iterable) Returns an accurate floating point sum of values in the iterable
isfinite(x) Returns True if x is neither an infinity nor a NaN (Not a Number)
isinf(x) Returns True if x is a positive or negative infinity
isnan(x) Returns True if x is a NaN
ldexp(x, i) Returns x * (2**i)
modf(x) Returns the fractional and integer parts of x
trunc(x) Returns the truncated integer value of x
exp(x) Returns e**x
expm1(x) Returns e**x - 1
log(x[, base]) Returns the logarithm of x to the base (defaults to e)
log1p(x) Returns the natural logarithm of 1+x
log2(x) Returns the base-2 logarithm of x
log10(x) Returns the base-10 logarithm of x
pow(x, y) Returns x raised to the power y
sqrt(x) Returns the square root of x
acos(x) Returns the arc cosine of x
asin(x) Returns the arc sine of x
atan(x) Returns the arc tangent of x
atan2(y, x) Returns atan(y / x)
cos(x) Returns the cosine of x
hypot(x, y) Returns the Euclidean norm, sqrt(x*x + y*y)
sin(x) Returns the sine of x
tan(x) Returns the tangent of x
degrees(x) Converts angle x from radians to degrees
radians(x) Converts angle x from degrees to radians
acosh(x) Returns the inverse hyperbolic cosine of x
asinh(x) Returns the inverse hyperbolic sine of x
atanh(x) Returns the inverse hyperbolic tangent of x
cosh(x) Returns the hyperbolic cosine of x
sinh(x) Returns the hyperbolic cosine of x
tanh(x) Returns the hyperbolic tangent of x
erf(x) Returns the error function at x
erfc(x) Returns the complementary error function at x
gamma(x) Returns the Gamma function at x
lgamma(x) Returns the natural logarithm of the absolute value of the Gamma function at x
pi Mathematical constant, the ratio of circumference of a circle to it's diameter (3.14159...)
e mathematical constant e (2.71828...)

def factorial(n):

num = 1 while n >= 1: num = num * n n = n - 1 return num

from math import factorial

print(factorial(1000))

def factorial(x):

result = 1 for i in xrange(2, x + 1): result *= i return result print(factorial(1000))

def factorial(n):

if n < 2: return 1 return n * factorial(n - 1)

def factorial(n):

base = 1 for i in range(n,0,-1): base = base * i print(base)

divmod(x, y)

returns a tuple (x / y, x % y)

The method list()

takes sequence types and converts them to lists. This is used to convert a given tuple into list. Note − Tuple are very similar to lists with only difference that element values of a tuple can not be changed and tuple elements are put between parentheses instead of square bracket.

itertools.product()

This tool computes the cartesian product of input iterables. It is equivalent to nested for-loops. For example, product(A, B) returns the same as ((x,y) for x in A for y in B). Sample Code from itertools import product print(list(product([1,2,3],repeat = 2))) [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)] print(list(product([1,2,3],[3,4]))) [(1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4)] A = [[1,2,3],[3,4,5]] print(list(product(*A))) [(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5)] B = [[1,2,3],[3,4,5],[7,8]] print(list(product(*B))) [(1, 3, 7), (1, 3, 8), (1, 4, 7), (1, 4, 8), (1, 5, 7), (1, 5, 8), (2, 3, 7), (2, 3, 8), (2, 4, 7), (2, 4, 8), (2, 5, 7), (2, 5, 8), (3, 3, 7), (3, 3, 8), (3, 4, 7), (3, 4, 8), (3, 5, 7), (3, 5, 8)]

How to use Loops in Python

For Loop

computer_brands = ["Apple", "Asus", "Dell", "Samsung"] for brands in computer_brands: print(brands) numbers = [1,10,20,30,40,50] sum = 0 for number in numbers: sum = sum + number print(sum) for i in range(1,10): print(i)

Break

To break out from a loop, you can use the keyword "break". for i in range(1,10): if i == 3: break print(i) # will print 1 nd 2 only

Continue

The continue statement is used to tell Python to skip the rest of the statements in the current loop block and to continue to the next iteration of the loop. for i in range(1,10): if i == 3: continue print(i) # will not print 3 only

While Loop

computer_brands = ["Apple", "Asus", "Dell", "Samsung"] i = 0 while i < len(computer_brands): print(computer_brands(i)) i = i + 1 while True: answer = raw_input("Start typing...") if answer == "quit": break print("Your answer was", answer) counter = 0 while counter <= 100: print(counter) counter + 2

Nested Loops

for x in range(1, 11): for y in range(1, 11): print('%d * %d = %d' % (x, y, x*y))

random

import random a = [1,2,3,4,5,6] print(a) random.shuffle(a) print(a) items = [1, 2, 3, 4, 5, 6, 7] random.shuffle(items) print(items)

Tkinter

Popular Python tkinter Projects Tkinter references from Tkinter import * root = Tk() w = Label(root, text="Hello Tkinter!") w.pack() root.mainloop() input("\n\nhit any key\n\n")

Hotel Management Systems

How to Create Hotel Management Systems in Python - Full Tutorial Python With Tkinter & Sqlite 3

Tensorflow 安装

https://morvanzhou.github.io/tutorials/machine-learning/tensorflow/1-2-install/ Tensorflow 安装 https://medium.com/@lmoroney_40129/installing-tensorflow-with-gpu-on-windows-10-3309fec55a00 Installing TensorFlow with GPU on Windows 10 pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0-py3-none-any.whl The script wheel.exe is installed in 'd:\python36-32\Scripts' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. import platform print(platform.python_version()) help('modules') import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello))

Python Simple HTTP server

import SimpleHTTPServer import SocketServer PORT = 8000 Handler = SimpleHTTPServer.SimpleHTTPRequestHandler httpd = SocketServer.TCPServer(("", PORT), Handler) print("serving at port", PORT) httpd.serve_forever()

about pip

C:\Users\User\Desktop>python -m pip install -U pylint --user C:\Users\User\AppData\Roaming\Python\Python36\Scripts' which is not on PATH.

run Python from Sublime Text

use SublimeREPL Tools -> Build System -> (choose) Python then: To Run: Tools -> Build -or- Ctrl + B This would start your file in the console which should be at the bottom of the editor. To Stop: Ctrl + Break or Tools -> Cancel Build You can find out where your Break key is here: http://en.wikipedia.org/wiki/Break_key. Note: CTRL + C will NOT work. What to do when Ctrl + Break does not work: Go to:
Preferences -> Key Bindings - User
and paste the line below: {"keys": ["ctrl+shift+c"], "command": "exec", "args": {"kill": true} } Now, you can use ctrl+shift+c instead of CTRL+BREAK But when CtrlB does not work, Sublime Text probably can't find the Python Interpreter. When trying to run your program, see the log and find the reference to Python in path. [cmd: [u'python', u'-u', u'C:\\scripts\\test.py']] [path: ...;C:\Python27 32bit;...] The point is that it tries to run python via command line, the cmd looks like: python -u C:\scripts\test.py If you can't run python from cmd, Sublime Text can't too. (Try it yourself in cmd, type python in it and run it, python commandline should appear)

SOLUTION

You can either change the Sublime Text build formula or the System %PATH%. To set your %PATH%: *You will need to restart your editor to load new %PATH% Run Command Line* and enter this command: *needs to be run as administrator SETX /M PATH "%PATH%;<python_folder>" for example: SETX /M PATH "%PATH%;C:\Python27;C:\Python27\Scripts"OR manually: (preferable) Add ;C:\Python27;C:\Python27\Scripts at the end of the string.

'calendar' has no attribute 'month'

AttributeError: module 'calendar' has no attribute 'month' import calendar yy = 2018 mm = 11 print(calendar.month(yy, mm)) AttributeError: module 'calendar' has no attribute 'month' The problem is that you used the name calendar.py for your file. Use any other name, and you will be able to import the python module calendar.

Reading and Writing Files

file_object = open(“filename”, “mode”) thisFile = open("thedatafile.txt","r") Print(thisFile) file = open(“testfile.txt”,”w”) file.write(“Hello World”) file.close() a number of ways to read a text file in Python, not just one. file = open(“testfile.text”, “r”) print(file.read() ) print(file.read(5) #read the first five characters) readline() read a file line by line print(file.readline(): ) print(file.readline(3): # the third line in the file) file.readlines() return every line print(file.readlines() ) for line in file: # Looping over a file object print(line,) add an EOL character to start a new line file.write(“This is a test”) file.write(“To add more lines.”) file.close() fh.close() to end things, close the file completely Opening a text file: fh = open(“hello.txt”, “r”) Reading a text file: Fh = open(“hello.txt”, “r”) print(fh.read() ) To read a text file one line at a time: fh = open(“hello.text”, “r”) print(fh.readline() ) To read a list of lines in a text file: fh = open(“hello.txt”, “r”) print(fh.readlines() ) To write new content or text to a file: fh = open(“hello.txt”, “w”) fh.write(“Put the text you want to add here”) fh.write(“and more lines if need be.”) fh.close() write multiple lines to a file at once: lines_of_text = [“One line of text here”, “and another line here”, “and yet another here”, “and so on and so forth”] fh.writelines(lines_of_text) fh.close() To append a file: fh = open(“hello.txt”, “a”) fh.write(“We Meet Again World”) fh.close With Statement with open(“testfile.txt”) as file: data = file.read() do something with data with open(“testfile.txt”) as f: for line in f: print(line, ) the above example didn’t use the “file.close()” method because automatically execution with open(“hello.txt”, “w”) as f: f.write(“Hello World”) To read a file line by line, output into a list: with open(“hello.txt”) as f: data = f.readlines() Splitting Lines in a Text File with open(“hello.text”, “r”) as f: data = f.readlines() for line in data: words = line.split() print(words) # use a colon instead of a space to split line.split(“:”) ====================== Python's list slice syntax can be used without indices for a few fun and useful things: # You can clear all elements from a list: >>> lst = [1, 2, 3, 4, 5] >>> del lst[:] >>> lst [] # You can replace all elements of a list # without creating a new list object: >>> a = lst >>> lst[:] = [7, 8, 9] >>> lst [7, 8, 9] >>> a [7, 8, 9] >>> a is lst True # You can also create a (shallow) copy of a list: >>> b = lst[:] >>> b [7, 8, 9] >>> b is lst False ====================== CPython easter egg # Here's a fun little CPython easter egg. # Just run the following in a Python 2.7+ # interpreter session: >>> import antigravity

unicodeescape codec can't decode bytes…

Unicode Error ”unicodeescape" codec can't decode bytes… The problem is with the string "C:\Users\Eric\Desktop\beeline.txt" \U starts an eight-character Unicode escape, such as '\U00014321`.

findfiles

import fnmatch # fnmatch — Unix filename pattern matching import os images = ['*.', '*.py'] matches = [] for root, dirnames, filenames in os.walk("D:/Users/Lawht/Desktop"): for extensions in images: for filename in fnmatch.filter(filenames, extensions): matches.append(os.path.join(root, filename)) print(filename) for root, dirnames, filenames in os.walk("C:/Users/User/Desktop"): print(root) # print(dirnames) # print(filenames) for root in os.walk("C:/Users/User/Desktop"): print(root)

walk()

walk() generates the file names in a directory tree import os for root, dirs, files in os.walk("."): for name in files: print(os.path.join(root, name)) for name in dirs: print(os.path.join(root, name))

add two matrices

# Program to add two matrices using nested loop X = [[12,7,3],[4,5,6],[7,8,9]] Y = [[5,8,1],[6,7,3],[4,5,9]] result = [[0,0,0],[0,0,0],[0,0,0]] # iterate through rows for i in range(len(X)): # iterate through columns for j in range(len(X[0])): result[i][j] = X[i][j] + Y[i][j] for r in result: print(r) Use Matrix library use the numpy module, which has support for this. import numpy as np a = np.matrix([[1,2,3], [4,5,6], [7,8,9]]) b = np.matrix([[9,8,7], [6,5,4], [3,2,1]]) print(a+b)

find the most common elements in an iterable:

>>> import collections >>> c = collections.Counter('helloworld') >>> c Counter({'l': 3, 'o': 2, 'e': 1, 'd': 1, 'h': 1, 'r': 1, 'w': 1}) >>> c.most_common(3) [('l', 3), ('o', 2), ('e', 1)]

itertools.permutations() generates permutations

# for an iterable. Time to brute-force those passwords ;-) >>> import itertools >>> for p in itertools.permutations('ABCD'): ... print(p) ('A', 'B', 'C', 'D') ('A', 'B', 'D', 'C') ('A', 'C', 'B', 'D') ('A', 'C', 'D', 'B') ('A', 'D', 'B', 'C') ('A', 'D', 'C', 'B') ('B', 'A', 'C', 'D') ('B', 'A', 'D', 'C') ('B', 'C', 'A', 'D') ('B', 'C', 'D', 'A') ('B', 'D', 'A', 'C') ('B', 'D', 'C', 'A') ('C', 'A', 'B', 'D') ('C', 'A', 'D', 'B') ('C', 'B', 'A', 'D') ('C', 'B', 'D', 'A') ('C', 'D', 'A', 'B') ('C', 'D', 'B', 'A') ('D', 'A', 'B', 'C') ('D', 'A', 'C', 'B') ('D', 'B', 'A', 'C') ('D', 'B', 'C', 'A') ('D', 'C', 'A', 'B') ('D', 'C', 'B', 'A')

When To Use __repr__ vs __str__?

# Emulate what the std lib does: >>> import datetime >>> today = datetime.date.today() # Result of __str__ should be readable: >>> str(today) '2017-02-02' # Result of __repr__ should be unambiguous: >>> repr(today) 'datetime.date(2017, 2, 2)' # Python interpreter sessions use # __repr__ to inspect objects: >>> today datetime.date(2017, 2, 2)

use Python's built-in "dis"

# module to disassemble functions and # inspect their CPython VM bytecode: >>> def greet(name): ... return 'Hello, ' + name + '!' >>> greet('Dan') 'Hello, Dan!' >>> import dis >>> dis.dis(greet) 2 0 LOAD_CONST 1 ('Hello, ') 2 LOAD_FAST 0 (name) 4 BINARY_ADD 6 LOAD_CONST 2 ('!') 8 BINARY_ADD 10 RETURN_VALUE # @classmethod vs @staticmethod vs "plain" methods # What's the difference? class MyClass: def method(self): """ Instance methods need a class instance and can access the instance through `self`. """ return 'instance method called', self @classmethod def classmethod(cls): """ Class methods don't need a class instance. They can't access the instance (self) but they have access to the class itself via `cls`. """ return 'class method called', cls @staticmethod def staticmethod(): """ Static methods don't have access to `cls` or `self`. They work like regular functions but belong to the class's namespace. """ return 'static method called' # All methods types can be # called on a class instance: >>> obj = MyClass() >>> obj.method() ('instance method called') >>> obj.classmethod() ('class method called') >>> obj.staticmethod() 'static method called' # Calling instance methods fails # if we only have the class object: >>> MyClass.classmethod() ('class method called') >>> MyClass.staticmethod() 'static method called' >>> MyClass.method() TypeError: "unbound method method() must be called with MyClass " "instance as first argument (got nothing instead)" # In Python 3.4+ you can use contextlib.suppress() to selectively ignore specific exceptions: import contextlib with contextlib.suppress(FileNotFoundError): os.remove('somefile.tmp') # This is equivalent to: try: os.remove('somefile.tmp') except FileNotFoundError: pass # Pythonic ways of checking if all items in a list are equal: >>> lst = ['a', 'a', 'a'] >>> len(set(lst)) == 1 True >>> all(x == lst[0] for x in lst) True >>> lst.count(lst[0]) == len(lst) True # Python's `for` and `while` loops # support an `else` clause that executes # only if the loops terminates without # hitting a `break` statement. def contains(haystack, needle): """ Throw a ValueError if `needle` not in `haystack`. """ for item in haystack: if item == needle: break else: # The `else` here is a # "completion clause" that runs # only if the loop ran to completion # without hitting a `break` statement. raise ValueError('Needle not found') >>> contains([23, 'needle', 0xbadc0ffee], 'needle') None >>> contains([23, 42, 0xbadc0ffee], 'needle') ValueError: "Needle not found" # better way for `for` and `while` loops support an `else` clause that executes only if the loops terminates without hitting a `break` statement., something like this: def better_contains(haystack, needle): for item in haystack: if item == needle: return raise ValueError('Needle not found') # Note: Typically you'd write something like this to do a membership test, which is much more Pythonic: if needle not in haystack: raise ValueError('Needle not found') # Virtual Environments ("virtualenvs") keep your project dependencies separated. # Before creating & activating a virtualenv: `python` and `pip` map to the system version of the Python interpreter (e.g. Python 2.7) $ which python /usr/local/bin/python # Let's create a fresh virtualenv using another version of Python (Python 3): $ python3 -m venv ./venv # A virtualenv is just a "Python environment in a folder": $ ls ./venv bin include lib pyvenv.cfg # Activating a virtualenv configures the current shell session to use the python (and pip) commands from the virtualenv folder instead of the global environment: $ source ./venv/bin/activate # Note how activating a virtualenv modifies your shell prompt with a little note showing the name of the virtualenv folder: (venv) $ echo "wee!" # With an active virtualenv, the `python` command maps to the interpreter binary *inside the active virtualenv*: (venv) $ which python /Users/dan/my-project/venv/bin/python3 # Installing new libraries and frameworks with `pip` now installs them *into the virtualenv sandbox*, leaving your global environment (and any other virtualenvs) completely unmodified: (venv) $ pip install requests # To get back to the global Python environment, run the following command: (venv) $ deactivate # (See how the prompt changed back to "normal" again?) $ echo "yay!" # Deactivating the virtualenv flipped the `python` and `pip` commands back to the global environment: $ which python /usr/local/bin/python # Python 3.3+ has a std lib module for displaying tracebacks even when Python "dies", e.g with a segfault: import faulthandler faulthandler.enable() # Can also be enabled with "python -X faulthandler" from the command line. # Learn more here: https://docs.python.org/3/library/faulthandler.html

interacting with databases

SQLAlchemy Python Tutorial interacting with databases # pip install PyMysql Python 資料庫圖解流程 Connection、Cursor比喻 import sqlite3 conn = sqlite3.connect("EX.db") cur = conn.cursor() def table(): cur.execute("CREATE TABLE exampl(rollno, REAL, Name TEXT, age, REAL)") def value(): cur.execute("INSERT INTO exampl VALUES(1, "Albert", 23)") conn.commit() # conn.close() # cur.close() def show(): cur.execute("SELECT * FROM exampl") data = cur.fetchall() print(data) # print(cur.fetchall()) table() value() show()

Pygame Tutorial

Python and Pygame Tutorial - Build Tetris! Full GameDev Course pygame Tutorial thepythongamebook Complete Pygame Project

PyFormat

PyFormat

PyInstaller

Making a Stand Alone Executable from a Python Script using PyInstaller There are plenty of tools available for converting python script into executable. For example, checkout: PyInstaller py2exe first install pyinstaller: pip install pyinstaller then run: pyinstaller -F testnumpy.py Convert Python Script Into Executable .exe File Using PyInstaller General Options youtube Convert PY to EXE Convert any Python File to .EXE py2exe official tutorial

kivy Create a package for Android

kivy Create a package for Android

Beginning Game Programming with Python

Beginning Game Programming with Python Making Games with Python & Pygame

Pygame

Games With Python And Pygame Pygame 安裝

python call batch file

import os os.system("killtask.bat") import os os.chdir("X:\Enter location of .bat file") os.startfile("ask.bat")

python multiple choice

1000 python questions answers python multiple choice questions python simple multiple choice quiz Python read lines of file into list

Python Development Environment

Setting up a Python Development Environment in Sublime Text Setting up a Python Development Environment Visual Studio Code to install vs code, search for visual studio code but not visual studio, vs code is free activity bar on the left: activity bar can be called out from command pallete Ctrl+shift+P Ctrl+shift+E: Explorer Ctrl+shift+F: search and replace Ctrl+shift+G: github Source Control Ctrl+shift+D: Debug Ctrl+shift+X: Extensions, the recommendations will have reason, search for sublime text keymap, popular extensions can be sorted according to rating, name, installs Zen Mode Ctrl+K Z. Double Esc exits Zen Mode python scripts: import sys print(sys.version) print(sys.executable) right click on screen to select run python file in terminal on bottom status bar click on interpreter may change to use different versions interpreter type cls in terminal may clear screen changing interpreter will create a .vscode folder storing runtime environment Ctrl+shift+P: vscode command pallete type color theme will select color themes type file icon to change file icons on bottom status bar far left a config icon called manage can select command pallete select default terminal by pressing F1or ctrl+Shift +P type Shell and select Default Shell pressing F1or ctrl+Shift +P and type default setting to show defaults ctrl + ` will open terminal, type where python will show the path or type python to enter python, type import sys, sys.executable will show path to exit python type exit() import sys import requests print(sys.version) print(sys.executable) print("hello") r = requests.get("https://google.com") print(r.status_code)

Python beep

import winsound frequency = 1100 # Set Frequency To 2500 Hertz duration = 1000 # Set Duration To 1000 ms == 1 second winsound.Beep(frequency, duration)

Python 編碼 Regex

Python Regex Flags python RE with utf8 Python RegEx Python re.UNICODE() Examples Unicode HOWTO Python 的編碼 谈谈R中的乱码(二) text = '測試' print(len(text)) # 為了支援 Unicode,Python 2.x 提供了 u 前置字來產生 unicode 物件。 text = u'測試' print(type(text)) print(len(text)) text = u'測試' b_str = text.encode('utf-8') type(b_str) b_str.decode('utf-8') Python 3 Unicode 支援、基本 I/O

Python Projects

Python Projects

image processing library

best image processing library for Python

The pass statement

The pass statement is a null operation; nothing happens when it executes. The pass is also useful in places where your code will eventually go, but has not been written yet (e.g., in stubs for example) − Example for letter in 'Python': if letter == 'h': pass print('This is pass block') print('Current Letter :', letter) print("Good bye!") result − Current Letter : P Current Letter : y Current Letter : t This is pass block Current Letter : h Current Letter : o Current Letter : n Good bye!

use main() function to call functions

def main(): data = read_input_file('data.csv') report = generate_report(data) write_report(report) # Application entry point -> call main() main()

contextlib.suppress() function

contextlib.suppress() function available in Python 3.4 use contextlib.suppress() to selectively ignore specific exceptions using a context manager and the "with" statement: import contextlib with contextlib.suppress(FileNotFoundError): os.remove('somefile.tmp') This is equivalent to the following try/except clause: try: os.remove('somefile.tmp') except FileNotFoundError: pass

Parallel computing in Python

(in 60 seconds or less) parallel programming in Python Parallel Processing in Python – A Practical Guide with Examples parallel programming using Python's multiprocessing module If your Python programs are slower than you'd like you can often speed them up by *parallelizing* them. Basically, parallel computing allows you to carry out many calculations at the same time, thus reducing the amount of time it takes to run your program to completion. I know, this sounds fairly vague and complicated somehow...but bear with me for the next 50 seconds or so. Here's an end-to-end example of parallel computing in Python 2/3, using only tools built into the Python standard library— Ready? Go! First, we need to do some setup work. We'll import the "collections" and the "multiprocessing" module so we can use Python's parallel computing facilities and define the data structure we'll work with: import collections import multiprocessing Second, we'll use "collections.namedtuple" to define a new (immutable) data type we can use to represent our data set, a collection of scientists: Scientist = collections.namedtuple('Scientist', [ 'name', 'born', ]) scientists = ( Scientist(name='Ada Lovelace', born=1815), Scientist(name='Emmy Noether', born=1882), Scientist(name='Marie Curie', born=1867), Scientist(name='Tu Youyou', born=1930), Scientist(name='Ada Yonath', born=1939), Scientist(name='Vera Rubin', born=1928), Scientist(name='Sally Ride', born=1951), ) Third, we'll write a "data processing function" that accepts a scientist object and returns a dictionary containing the scientist's name and their calculated age: def process_item(item): return { 'name': item.name, 'age': 2017 - item.born } The process_item() function just represents a simple data transformation to keep this example short and sweet—but you could swap it out with a much more complex computation no problem. (20 seconds remaining) Fourth, and this is where the real parallelization magic happens, we'll set up a "multiprocessing pool" that allows us to spread our calculations across all available CPU cores. Then we call the pool's map() method to apply our process_item() function to all scientist objects, in parallel batches: pool = multiprocessing.Pool() result = pool.map(process_item, scientists) Note how batching and distributing the work across multiple CPU cores, performing the work, and collecting the results are all handled by the multiprocessing pool. How great is that? Fifth, we're all done here with 5 seconds remaining— Let's print(the results of our data transformation to the console so we can )make sure the program did what it was supposed to: print(tuple(result)) That's the end of our little program. And here's what you should expect to see printed out on your console: ({'name': 'Ada Lovelace', 'age': 202}, {'name': 'Emmy Noether', 'age': 135}, {'name': 'Marie Curie', 'age': 150}, {'name': 'Tu Youyou', 'age': 87}, {'name': 'Ada Yonath', 'age': 78}, {'name': 'Vera Rubin', 'age': 89}, {'name': 'Sally Ride', 'age': 66}) Isn't Python just lovely? Now, obviously I took some shortcuts here and picked an example that made parallelization seem effortless—

Python GuiProgramming

Python GUI Tkinter,PyQt, wxPython wxPython by Example wxPython GUI Application wxFormBuilder and wxPython Tutorial Python GuiProgramming python gui programming Python GUI Examples (Tkinter Tutorial) important PyQt5 tutorial create a Python GUI PyQt5 tutorial Tk tutorial Tk tutorial onepage Python Tkinter Intro tkinter Layout Managers 尝试GUI,首先选择的就是Python自带的Tkinter,然后在网上找了Tkinter的相关文档: 第二份是2014年度辛星Tkinter教程第二版,内容浅显易懂; 第三份是Python GUI Programming Cookbook,内容详细,帮助很大。 先上结论:Tkinter真的不像大家说的那么差,基本的功能都能实现,而且简单易学容易上手。对于界面颜值没有太高要求的,用Tkinter写GUI是不错的选择。 *** Python GUI examples *** *** tkinter layout *** There are two kinds of widgets: containers and their children. The containers group their children into suitable layouts. Tkinter has three built-in layout managers: the pack, grid, and place managers. The place geometry manager positions widgets using absolute positioning. The pack geometry manager organises widgets in horizontal and vertical boxes. The grid geometry manager places widgets in a two dimensional grid.

comprehensive data exploration with python

comprehensive data exploration with python 'The most difficult thing in life is to know yourself' This quote belongs to Thales of Miletus. Thales was a Greek/Phonecian philosopher, mathematician and astronomer, which is recognised as the first individual in Western civilisation known to have entertained and engaged in scientific thought. I wouldn't say that knowing your data is the most difficult thing in data science, but it is time-consuming. Therefore, it's easy to overlook this initial step and jump too soon into the water. So I tried to learn how to swim before jumping into the water. Based on Hair et al. (2013), chapter 'Examining your data', I did my best to follow a comprehensive, but not exhaustive, analysis of the data. I'm far from reporting a rigorous study in this kernel, but I hope that it can be useful for the community, so I'm sharing how I applied some of those data analysis principles to this problem. Despite the strange names I gave to the chapters, what we are doing in this kernel is something like: Understand the problem We'll look at each variable and do a philosophical analysis about their meaning and importance for this problem. Univariable study We'll just focus on the dependent variable ('SalePrice') and try to know a little bit more about it. Multivariate study We'll try to understand how the dependent variable and independent variables relate. Basic cleaning We'll clean the dataset and handle the missing data, outliers and categorical variables. Test assumptions We'll check if our data meets the assumptions required by most multivariate techniques. Now, it's time to have fun! #invite people for the Kaggle party import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np from scipy.stats import norm from sklearn.preprocessing import StandardScaler from scipy import stats import warnings warnings.filterwarnings ( 'ignore' ) % matplotlib inline #bring in the six packs df_train = pd.read_csv ( '../input/train.csv' ) #check the decoration df_train.columns Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'PoolQC', 'Fence', 'MiscFeature', 'MiscVal', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition', 'SalePrice'], dtype='object')

1. So... What can we expect?

In order to understand our data, we can look at each variable and try to understand their meaning and relevance to this problem. I know this is time-consuming, but it will give us the flavour of our dataset. In order to have some discipline in our analysis, we can create an Excel spreadsheet with the following columns: Variable - Variable name. Type - Identification of the variables' type. There are two possible values for this field: 'numerical' or 'categorical'. By 'numerical' we mean variables for which the values are numbers, and by 'categorical' we mean variables for which the values are categories. Segment - Identification of the variables' segment. We can define three possible segments: building, space or location. When we say 'building', we mean a variable that relates to the physical characteristics of the building (e.g. 'OverallQual'). When we say 'space', we mean a variable that reports space properties of the house (e.g. 'TotalBsmtSF'). Finally, when we say a 'location', we mean a variable that gives information about the place where the house is located (e.g. 'Neighborhood'). Expectation - Our expectation about the variable influence in 'SalePrice'. We can use a categorical scale with 'High', 'Medium' and 'Low' as possible values. Conclusion - Our conclusions about the importance of the variable, after we give a quick look at the data. We can keep with the same categorical scale as in 'Expectation'. Comments - Any general comments that occured to us. While 'Type' and 'Segment' is just for possible future reference, the column 'Expectation' is important because it will help us develop a 'sixth sense'. To fill this column, we should read the description of all the variables and, one by one, ask ourselves: Do we think about this variable when we are buying a house? (e.g. When we think about the house of our dreams, do we care about its 'Masonry veneer type'?). If so, how important would this variable be? (e.g. What is the impact of having 'Excellent' material on the exterior instead of 'Poor'? And of having 'Excellent' instead of 'Good'?). Is this information already described in any other variable? (e.g. If 'LandContour' gives the flatness of the property, do we really need to know the 'LandSlope'?). After this daunting exercise, we can filter the spreadsheet and look carefully to the variables with 'High' 'Expectation'. Then, we can rush into some scatter plots between those variables and 'SalePrice', filling in the 'Conclusion' column which is just the correction of our expectations. I went through this process and concluded that the following variables can play an important role in this problem: OverallQual (which is a variable that I don't like because I don't know how it was computed; a funny exercise would be to predict 'OverallQual' using all the other variables available). YearBuilt. TotalBsmtSF. GrLivArea. I ended up with two 'building' variables ('OverallQual' and 'YearBuilt') and two 'space' variables ('TotalBsmtSF' and 'GrLivArea'). This might be a little bit unexpected as it goes against the real estate mantra that all that matters is 'location, location and location'. It is possible that this quick data examination process was a bit harsh for categorical variables. For example, I expected the 'Neigborhood' variable to be more relevant, but after the data examination I ended up excluding it. Maybe this is related to the use of scatter plots instead of boxplots, which are more suitable for categorical variables visualization. The way we visualize data often influences our conclusions. However, the main point of this exercise was to think a little about our data and expectactions, so I think we achieved our goal. Now it's time for 'a little less conversation, a little more action please'. Let's shake it!

2. First things first: analysing 'SalePrice'

'SalePrice' is the reason of our quest. It's like when we're going to a party. We always have a reason to be there. Usually, women are that reason. (disclaimer: adapt it to men, dancing or alcohol, according to your preferences) Using the women analogy, let's build a little story, the story of 'How we met 'SalePrice''. Everything started in our Kaggle party, when we were looking for a dance partner. After a while searching in the dance floor, we saw a girl, near the bar, using dance shoes. That's a sign that she's there to dance. We spend much time doing predictive modelling and participating in analytics competitions, so talking with girls is not one of our super powers. Even so, we gave it a try: 'Hi, I'm Kaggly! And you? 'SalePrice'? What a beautiful name! You know 'SalePrice', could you give me some data about you? I just developed a model to calculate the probability of a successful relationship between two people. I'd like to apply it to us!' #descriptive statistics summary df_train [ 'SalePrice' ].describe () count 1460.000000 mean 180921.195890 std 79442.502883 min 34900.000000 25% 129975.000000 50% 163000.000000 75% 214000.000000 max 755000.000000 Name: SalePrice, dtype: float64 'Very well... It seems that your minimum price is larger than zero. Excellent! You don't have one of those personal traits that would destroy my model! Do you have any picture that you can send me? I don't know... like, you in the beach... or maybe a selfie in the gym?' #histogram sns.distplot ( df_train [ 'SalePrice' ]); 'Ah! I see you that you use seaborn makeup when you're going out... That's so elegant! I also see that you: Deviate from the normal distribution. Have appreciable positive skewness. Show peakedness. This is getting interesting! 'SalePrice', could you give me your body measures?' #skewness and kurtosis print(( "Skewness: %f " % df_train [ 'SalePrice' ].skew ())) print(( "Kurtosis: %f " % df_train [ 'SalePrice' ].kurt ())) Skewness: 1.882876 Kurtosis: 6.536282 'Amazing! If my love calculator is correct, our success probability is 97.834657%. I think we should meet again! Please, keep my number and give me a call if you're free next Friday. See you in a while, crocodile!'

'SalePrice', her buddies and her interests

It is military wisdom to choose the terrain where you will fight. As soon as 'SalePrice' walked away, we went to Facebook. Yes, now this is getting serious. Notice that this is not stalking. It's just an intense research of an individual, if you know what I mean. According to her profile, we have some common friends. Besides Chuck Norris, we both know 'GrLivArea' and 'TotalBsmtSF'. Moreover, we also have common interests such as 'OverallQual' and 'YearBuilt'. This looks promising! To take the most out of our research, we will start by looking carefully at the profiles of our common friends and later we will focus on our common interests.

Relationship with numerical variables

#scatter plot grlivarea/saleprice var = 'GrLivArea' data = pd.concat ([ df_train [ 'SalePrice' ], df_train [ var ]], axis = 1 ) data.plot.scatter ( x = var , y = 'SalePrice' , ylim = ( 0 , 800000 )); Hmmm... It seems that 'SalePrice' and 'GrLivArea' are really old friends, with a linear relationship. And what about 'TotalBsmtSF'? #scatter plot totalbsmtsf/saleprice var = 'TotalBsmtSF' data = pd.concat ([ df_train [ 'SalePrice' ], df_train [ var ]], axis = 1 ) data.plot.scatter ( x = var , y = 'SalePrice' , ylim = ( 0 , 800000 )); 'TotalBsmtSF' is also a great friend of 'SalePrice' but this seems a much more emotional relationship! Everything is ok and suddenly, in a strong linear (exponential?) reaction, everything changes. Moreover, it's clear that sometimes 'TotalBsmtSF' closes in itself and gives zero credit to 'SalePrice'.

Relationship with categorical features

#box plot overallqual/saleprice var = 'OverallQual' data = pd.concat ([ df_train [ 'SalePrice' ], df_train [ var ]], axis = 1 ) f , ax = plt.subplots ( figsize = ( 8 , 6 )) fig = sns.boxplot ( x = var , y = "SalePrice" , data = data ) fig.axis ( ymin = 0 , ymax = 800000 ); Like all the pretty girls, 'SalePrice' enjoys 'OverallQual'. Note to self: consider whether McDonald's is suitable for the first date. var = 'YearBuilt' data = pd.concat ([ df_train [ 'SalePrice' ], df_train [ var ]], axis = 1 ) f , ax = plt.subplots ( figsize = ( 16 , 8 )) fig = sns.boxplot ( x = var , y = "SalePrice" , data = data ) fig.axis ( ymin = 0 , ymax = 800000 ); plt.xticks ( rotation = 90 ); Although it's not a strong tendency, I'd say that 'SalePrice' is more prone to spend more money in new stuff than in old relics. Note : we don't know if 'SalePrice' is in constant prices. Constant prices try to remove the effect of inflation. If 'SalePrice' is not in constant prices, it should be, so than prices are comparable over the years.

In summary

Stories aside, we can conclude that: 'GrLivArea' and 'TotalBsmtSF' seem to be linearly related with 'SalePrice'. Both relationships are positive, which means that as one variable increases, the other also increases. In the case of 'TotalBsmtSF', we can see that the slope of the linear relationship is particularly high. 'OverallQual' and 'YearBuilt' also seem to be related with 'SalePrice'. The relationship seems to be stronger in the case of 'OverallQual', where the box plot shows how sales prices increase with the overall quality. We just analysed four variables, but there are many other that we should analyse. The trick here seems to be the choice of the right features (feature selection) and not the definition of complex relationships between them (feature engineering). That said, let's separate the wheat from the chaff.

3. Keep calm and work smart

Until now we just followed our intuition and analysed the variables we thought were important. In spite of our efforts to give an objective character to our analysis, we must say that our starting point was subjective. As an engineer, I don't feel comfortable with this approach. All my education was about developing a disciplined mind, able to withstand the winds of subjectivity. There's a reason for that. Try to be subjective in structural engineering and you will see physics making things fall down. It can hurt. So, let's overcome inertia and do a more objective analysis.

The 'plasma soup'

'In the very beginning there was nothing except for a plasma soup. What is known of these brief moments in time, at the start of our study of cosmology, is largely conjectural. However, science has devised some sketch of what probably happened, based on what is known about the universe today.' (source: http://umich.edu/~gs265/bigbang.htm ) To explore the universe, we will start with some practical recipes to make sense of our 'plasma soup': Correlation matrix (heatmap style). 'SalePrice' correlation matrix (zoomed heatmap style). Scatter plots between the most correlated variables (move like Jagger style).

Correlation matrix (heatmap style)

#correlation matrix corrmat = df_train.corr() f , ax = plt.subplots ( figsize = ( 12 , 9 )) sns.heatmap ( corrmat , vmax =.8 , square = True ); In my opinion, this heatmap is the best way to get a quick overview of our 'plasma soup' and its relationships. (Thank you @seaborn!) At first sight, there are two red colored squares that get my attention. The first one refers to the 'TotalBsmtSF' and '1stFlrSF' variables, and the second one refers to the 'Garage X ' variables. Both cases show how significant the correlation is between these variables. Actually, this correlation is so strong that it can indicate a situation of multicollinearity. If we think about these variables, we can conclude that they give almost the same information so multicollinearity really occurs. Heatmaps are great to detect this kind of situations and in problems dominated by feature selection, like ours, they are an essential tool. Another thing that got my attention was the 'SalePrice' correlations. We can see our well-known 'GrLivArea', 'TotalBsmtSF', and 'OverallQual' saying a big 'Hi!', but we can also see many other variables that should be taken into account. That's what we will do next.

'SalePrice' correlation matrix (zoomed heatmap style)

#saleprice correlation matrix k = 10 #number of variables for heatmap cols = corrmat.nlargest ( k , 'SalePrice' )[ 'SalePrice' ].index cm = np.corrcoef ( df_train [ cols ].values.T ) sns.set ( font_scale = 1.25 ) hm = sns.heatmap ( cm , cbar = True , annot = True , square = True , fmt = '.2f' , annot_kws = { 'size' : 10 }, yticklabels = cols.values , xticklabels = cols.values ) plt.show () According to our crystal ball, these are the variables most correlated with 'SalePrice'. My thoughts on this: 'OverallQual', 'GrLivArea' and 'TotalBsmtSF' are strongly correlated with 'SalePrice'. Check! 'GarageCars' and 'GarageArea' are also some of the most strongly correlated variables. However, as we discussed in the last sub-point, the number of cars that fit into the garage is a consequence of the garage area. 'GarageCars' and 'GarageArea' are like twin brothers. You'll never be able to distinguish them. Therefore, we just need one of these variables in our analysis (we can keep 'GarageCars' since its correlation with 'SalePrice' is higher). 'TotalBsmtSF' and '1stFloor' also seem to be twin brothers. We can keep 'TotalBsmtSF' just to say that our first guess was right (re-read 'So... What can we expect?'). 'FullBath'?? Really? 'TotRmsAbvGrd' and 'GrLivArea', twin brothers again. Is this dataset from Chernobyl? Ah... 'YearBuilt'... It seems that 'YearBuilt' is slightly correlated with 'SalePrice'. Honestly, it scares me to think about 'YearBuilt' because I start feeling that we should do a little bit of time-series analysis to get this right. I'll leave this as a homework for you. Let's proceed to the scatter plots.

Scatter plots between 'SalePrice' and correlated variables (move like Jagger style)

Get ready for what you're about to see. I must confess that the first time I saw these scatter plots I was totally blown away! So much information in so short space... It's just amazing. Once more, thank you @seaborn! You make me 'move like Jagger'! #scatterplot sns.set () cols = [ 'SalePrice' , 'OverallQual' , 'GrLivArea' , 'GarageCars' , 'TotalBsmtSF' , 'FullBath' , 'YearBuilt' ] sns.pairplot ( df_train [ cols ], size = 2.5 ) plt.show (); Although we already know some of the main figures, this mega scatter plot gives us a reasonable idea about variables relationships. One of the figures we may find interesting is the one between 'TotalBsmtSF' and 'GrLiveArea'. In this figure we can see the dots drawing a linear line, which almost acts like a border. It totally makes sense that the majority of the dots stay below that line. Basement areas can be equal to the above ground living area, but it is not expected a basement area bigger than the above ground living area (unless you're trying to buy a bunker). The plot concerning 'SalePrice' and 'YearBuilt' can also make us think. In the bottom of the 'dots cloud', we see what almost appears to be a shy exponential function (be creative). We can also see this same tendency in the upper limit of the 'dots cloud' (be even more creative). Also, notice how the set of dots regarding the last years tend to stay above this limit (I just wanted to say that prices are increasing faster now). Ok, enough of Rorschach test for now. Let's move forward to what's missing: missing data!

4. Missing data

Important questions when thinking about missing data: How prevalent is the missing data? Is missing data random or does it have a pattern? The answer to these questions is important for practical reasons because missing data can imply a reduction of the sample size. This can prevent us from proceeding with the analysis. Moreover, from a substantive perspective, we need to ensure that the missing data process is not biased and hidding an inconvenient truth. #missing data total = df_train.isnull ().sum ().sort_values ( ascending = False ) percent = ( df_train.isnull ().sum () / df_train.isnull ().count ()).sort_values ( ascending = False ) missing_data = pd.concat ([ total , percent ], axis = 1 , keys = [ 'Total' , 'Percent' ]) missing_data.head ( 20 )
Total Percent
PoolQC 1453 0.995205
MiscFeature 1406 0.963014
Alley 1369 0.937671
Fence 1179 0.807534
FireplaceQu 690 0.472603
LotFrontage 259 0.177397
GarageCond 81 0.055479
GarageType 81 0.055479
GarageYrBlt 81 0.055479
GarageFinish 81 0.055479
GarageQual 81 0.055479
BsmtExposure 38 0.026027
BsmtFinType2 38 0.026027
BsmtFinType1 37 0.025342
BsmtCond 37 0.025342
BsmtQual 37 0.025342
MasVnrArea 8 0.005479
MasVnrType 8 0.005479
Electrical 1 0.000685
Utilities 0 0.000000
Let's analyse this to understand how to handle the missing data. We'll consider that when more than 15% of the data is missing, we should delete the corresponding variable and pretend it never existed. This means that we will not try any trick to fill the missing data in these cases. According to this, there is a set of variables (e.g. 'PoolQC', 'MiscFeature', 'Alley', etc.) that we should delete. The point is: will we miss this data? I don't think so. None of these variables seem to be very important, since most of them are not aspects in which we think about when buying a house (maybe that's the reason why data is missing?). Moreover, looking closer at the variables, we could say that variables like 'PoolQC', 'MiscFeature' and 'FireplaceQu' are strong candidates for outliers, so we'll be happy to delete them. In what concerns the remaining cases, we can see that 'Garage X ' variables have the same number of missing data. I bet missing data refers to the same set of observations (although I will not check it; it's just 5% and we should not spend 20 in 5 problems). Since the most important information regarding garages is expressed by 'GarageCars' and considering that we are just talking about 5% of missing data, I'll delete the mentioned 'Garage X ' variables. The same logic applies to 'Bsmt X ' variables. Regarding 'MasVnrArea' and 'MasVnrType', we can consider that these variables are not essential. Furthermore, they have a strong correlation with 'YearBuilt' and 'OverallQual' which are already considered. Thus, we will not lose information if we delete 'MasVnrArea' and 'MasVnrType'. Finally, we have one missing observation in 'Electrical'. Since it is just one observation, we'll delete this observation and keep the variable. In summary, to handle missing data, we'll delete all the variables with missing data, except the variable 'Electrical'. In 'Electrical' we'll just delete the observation with missing data. #dealing with missing data df_train = df_train.drop (( missing_data [ missing_data [ 'Total' ] > 1 ]). index , 1 ) df_train = df_train.drop ( df_train.loc [ df_train [ 'Electrical' ]. isnull ()]. index ) df_train.isnull ().sum ().max () #just checking that there's no missing data missing... 0

Out liars!

Outliers is also something that we should be aware of. Why? Because outliers can markedly affect our models and can be a valuable source of information, providing us insights about specific behaviours. Outliers is a complex subject and it deserves more attention. Here, we'll just do a quick analysis through the standard deviation of 'SalePrice' and a set of scatter plots.

Univariate analysis

The primary concern here is to establish a threshold that defines an observation as an outlier. To do so, we'll standardize the data. In this context, data standardization means converting data values to have mean of 0 and a standard deviation of 1. #standardizing data saleprice_scaled = StandardScaler ().fit_transform ( df_train [ 'SalePrice' ][:, np. newaxis ]); low_range = saleprice_scaled [ saleprice_scaled [:, 0 ]. argsort ()][: 10 ] high_range = saleprice_scaled [ saleprice_scaled [:, 0 ]. argsort ()][ - 10 :] print(( 'outer range (low) of the distribution:' )) print(( low_range )) print(( ' \n outer range (high) of the distribution:' )) print(( high_range )) outer range (low) of the distribution: [[-1.83820775] [-1.83303414] [-1.80044422] [-1.78282123] [-1.77400974] [-1.62295562] [-1.6166617 ] [-1.58519209] [-1.58519209] [-1.57269236]] outer range (high) of the distribution: [[3.82758058] [4.0395221 ] [4.49473628] [4.70872962] [4.728631 ] [5.06034585] [5.42191907] [5.58987866] [7.10041987] [7.22629831]] How 'SalePrice' looks with her new clothes: Low range values are similar and not too far from 0. High range values are far from 0 and the 7.something values are really out of range. For now, we'll not consider any of these values as an outlier but we should be careful with those two 7.something values.

Bivariate analysis

We already know the following scatter plots by heart. However, when we look to things from a new perspective, there's always something to discover. As Alan Kay said, 'a change in perspective is worth 80 IQ points'. #bivariate analysis saleprice/grlivarea var = 'GrLivArea' data = pd.concat ([ df_train [ 'SalePrice' ], df_train [ var ]], axis = 1 ) data. plot. scatter ( x = var , y = 'SalePrice' , ylim = ( 0 , 800000 )); 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. What has been revealed: The two values with bigger 'GrLivArea' seem strange and they are not following the crowd. We can speculate why this is happening. Maybe they refer to agricultural area and that could explain the low price. I'm not sure about this but I'm quite confident that these two points are not representative of the typical case. Therefore, we'll define them as outliers and delete them. The two observations in the top of the plot are those 7.something observations that we said we should be careful about. They look like two special cases, however they seem to be following the trend. For that reason, we will keep them. #deleting points df_train.sort_values ( by = 'GrLivArea' , ascending = False )[: 2 ] df_train = df_train.drop ( df_train [ df_train [ 'Id' ] == 1299 ]. index ) df_train = df_train.drop ( df_train [ df_train [ 'Id' ] == 524 ]. index ) #bivariate analysis saleprice/grlivarea var = 'TotalBsmtSF' data = pd.concat ([ df_train [ 'SalePrice' ], df_train [ var ]], axis = 1 ) data. plot. scatter ( x = var , y = 'SalePrice' , ylim = ( 0 , 800000 )); 'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'. Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points. We can feel tempted to eliminate some observations (e.g. TotalBsmtSF > 3000) but I suppose it's not worth it. We can live with that, so we'll not do anything.

5. Getting hard core

In Ayn Rand's novel, 'Atlas Shrugged', there is an often-repeated question: who is John Galt? A big part of the book is about the quest to discover the answer to this question. I feel Randian now. Who is 'SalePrice'? The answer to this question lies in testing for the assumptions underlying the statistical bases for multivariate analysis. We already did some data cleaning and discovered a lot about 'SalePrice'. Now it's time to go deep and understand how 'SalePrice' complies with the statistical assumptions that enables us to apply multivariate techniques. According to Hair et al. (2013) , four assumptions should be tested: Normality - When we talk about normality what we mean is that the data should look like a normal distribution. This is important because several statistic tests rely on this (e.g. t-statistics). In this exercise we'll just check univariate normality for 'SalePrice' (which is a limited approach). Remember that univariate normality doesn't ensure multivariate normality (which is what we would like to have), but it helps. Another detail to take into account is that in big samples (>200 observations) normality is not such an issue. However, if we solve normality, we avoid a lot of other problems (e.g. heteroscedacity) so that's the main reason why we are doing this analysis. Homoscedasticity - I just hope I wrote it right. Homoscedasticity refers to the 'assumption that dependent variable(s) exhibit equal levels of variance across the range of predictor variable(s)' (Hair et al., 2013) Homoscedasticity is desirable because we want the error term to be the same across all values of the independent variables. Linearity - The most common way to assess linearity is to examine scatter plots and search for linear patterns. If patterns are not linear, it would be worthwhile to explore data transformations. However, we'll not get into this because most of the scatter plots we've seen appear to have linear relationships. Absence of correlated errors - Correlated errors, like the definition suggests, happen when one error is correlated to another. For instance, if one positive error makes a negative error systematically, it means that there's a relationship between these variables. This occurs often in time series, where some patterns are time related. We'll also not get into this. However, if you detect something, try to add a variable that can explain the effect you're getting. That's the most common solution for correlated errors. What do you think Elvis would say about this long explanation? 'A little less conversation, a little more action please'? Probably... By the way, do you know what was Elvis's last great hit? (...) The bathroom floor.

In the search for normality

The point here is to test 'SalePrice' in a very lean way. We'll do this paying attention to: Histogram - Kurtosis and skewness. Normal probability plot - Data distribution should closely follow the diagonal that represents the normal distribution. #histogram and normal probability plot sns.distplot ( df_train [ 'SalePrice' ], fit = norm ); fig = plt.figure () res = stats. probplot ( df_train [ 'SalePrice' ], plot = plt ) Ok, 'SalePrice' is not normal. It shows 'peakedness', positive skewness and does not follow the diagonal line. But everything's not lost. A simple data transformation can solve the problem. This is one of the awesome things you can learn in statistical books: in case of positive skewness, log transformations usually works well. When I discovered this, I felt like an Hogwarts' student discovering a new cool spell. Avada kedavra! #applying log transformation df_train [ 'SalePrice' ] = np. log ( df_train [ 'SalePrice' ]) #transformed histogram and normal probability plot sns.distplot ( df_train [ 'SalePrice' ], fit = norm ); fig = plt.figure () res = stats. probplot ( df_train [ 'SalePrice' ], plot = plt ) Done! Let's check what's going on with 'GrLivArea'. #histogram and normal probability plot sns.distplot ( df_train [ 'GrLivArea' ], fit = norm ); fig = plt.figure () res = stats. probplot ( df_train [ 'GrLivArea' ], plot = plt ) Tastes like skewness... Avada kedavra! #data transformation df_train [ 'GrLivArea' ] = np. log ( df_train [ 'GrLivArea' ]) #transformed histogram and normal probability plot sns.distplot ( df_train [ 'GrLivArea' ], fit = norm ); fig = plt.figure () res = stats. probplot ( df_train [ 'GrLivArea' ], plot = plt ) Next, please... #histogram and normal probability plot sns.distplot ( df_train [ 'TotalBsmtSF' ], fit = norm ); fig = plt.figure () res = stats. probplot ( df_train [ 'TotalBsmtSF' ], plot = plt ) Ok, now we are dealing with the big boss. What do we have here? Something that, in general, presents skewness. A significant number of observations with value zero (houses without basement). A big problem because the value zero doesn't allow us to do log transformations. To apply a log transformation here, we'll create a variable that can get the effect of having or not having basement (binary variable). Then, we'll do a log transformation to all the non-zero observations, ignoring those with value zero. This way we can transform data, without losing the effect of having or not basement. I'm not sure if this approach is correct. It just seemed right to me. That's what I call 'high risk engineering'. #create column for new variable (one is enough because it's a binary categorical feature) #if area>0 it gets 1, for area==0 it gets 0 df_train [ 'HasBsmt' ] = pd.Series ( len ( df_train [ 'TotalBsmtSF' ]), index = df_train.index ) df_train [ 'HasBsmt' ] = 0 df_train.loc [ df_train [ 'TotalBsmtSF' ] > 0 , 'HasBsmt' ] = 1 #transform data df_train.loc [ df_train [ 'HasBsmt' ] == 1 , 'TotalBsmtSF' ] = np. log ( df_train [ 'TotalBsmtSF' ]) #histogram and normal probability plot sns.distplot ( df_train [ df_train [ 'TotalBsmtSF' ] > 0 ][ 'TotalBsmtSF' ], fit = norm ); fig = plt.figure () res = stats. probplot ( df_train [ df_train [ 'TotalBsmtSF' ] > 0 ][ 'TotalBsmtSF' ], plot = plt )

In the search for writing 'homoscedasticity' right at the first attempt

The best approach to test homoscedasticity for two metric variables is graphically. Departures from an equal dispersion are shown by such shapes as cones (small dispersion at one side of the graph, large dispersion at the opposite side) or diamonds (a large number of points at the center of the distribution). Starting by 'SalePrice' and 'GrLivArea'... #scatter plot plt.scatter ( df_train [ 'GrLivArea' ], df_train [ 'SalePrice' ]); Older versions of this scatter plot (previous to log transformations), had a conic shape (go back and check 'Scatter plots between 'SalePrice' and correlated variables (move like Jagger style)'). As you can see, the current scatter plot doesn't have a conic shape anymore. That's the power of normality! Just by ensuring normality in some variables, we solved the homoscedasticity problem. Now let's check 'SalePrice' with 'TotalBsmtSF'. #scatter plot plt.scatter ( df_train [ df_train [ 'TotalBsmtSF' ] > 0 ][ 'TotalBsmtSF' ], df_train [ df_train [ 'TotalBsmtSF' ] > 0 ][ 'SalePrice' ]); We can say that, in general, 'SalePrice' exhibit equal levels of variance across the range of 'TotalBsmtSF'. Cool!

Last but not the least, dummy variables

Easy mode. #convert categorical variable into dummy df_train = pd.get_dummies ( df_train )

Conclusion

That's it! We reached the end of our exercise. Throughout this kernel we put in practice many of the strategies proposed by Hair et al. (2013). We philosophied about the variables, we analysed 'SalePrice' alone and with the most correlated variables, we dealt with missing data and outliers, we tested some of the fundamental statistical assumptions and we even transformed categorial variables into dummy variables. That's a lot of work that Python helped us make easier. But the quest is not over. Remember that our story stopped in the Facebook research. Now it's time to give a call to 'SalePrice' and invite her to dinner. Try to predict her behaviour. Do you think she's a girl that enjoys regularized linear regression approaches? Or do you think she prefers ensemble methods? Or maybe something else? It's up to you to find out.

8个Python设计小技巧

这篇文章主要和大家分享一些 Python 不一样的技巧,感受 Python 带给你的乐趣吧。 1.print(打印带有颜色的信息) 大家知道 Python 中的信息打印函数 Print,一般我们会使用它打印一些东西,作为一个简单调试。 但是你知道么,这个 Print(打印出来的字体颜色是可以设置的。) 一个小例子 1. def esc (code= 0 ): 2. return f'[ {code} m' 3. print(esc( '31;1;0' ) + 'Error:' +esc()+ 'important' ) 在控制台或者 Pycharm 运行这段代码之后你会得到结果。 Error :important 其中 Error 是红色加下划线的,important 为默认色 其设置格式为:[显示方式;前景色;背景色 m 下面可以设置的参数: 1. 说明: 2. 前景色 背景色 颜色 3. --------------------------------------- 4. 30 40 黑色 5. 31 41 红色 6. 32 42 绿色 7. 33 43 黃色 8. 34 44 蓝色 9. 35 45 紫红色 10. 36 46 青蓝色 11. 37 47 白色 12. 13. 显示方式 意义 14. ------------------------- 15. 0 终端默认设置 16. 1 高亮显示 17. 4 使用下划线 18. 5 闪烁 19. 7 反白显示 20. 8 不可见 21. 22. 例子: 23. [1;31;40m &lt;!--1-高亮显示 31-前景色红色 40-背景色黑色--&gt; 2.在 Python 中使用定时器 今天看到一个比较人性化的定时模块 schedule,目前 star 数为 6432,还是非常的受欢迎,这个模块也是秉承这 For Humans 的原则,这里推荐给大家。 地址 https://github.com/dbader/schedule 1).通过 pip 即可安装。 pip install schedule 2.)使用案例 1. import schedule 2. import time 3. 4. def job(): 5. print( "I'm working..." ) 6. 7. schedule.every(10 ).minutes. do (job) 8. schedule.every().hour . do (job) 9. schedule.every().day .at( "10:30" ). do (job) 10. schedule.every().monday. do (job) 11. schedule.every().wednesday.at( "13:15" ). do (job) 12. schedule.every().minute .at( ":17" ). do (job) 13. 14. while True : 15. schedule.run_pending() 16. time .sleep(1 ) 从单词的字面意思,你就知道这是做什么的。举个例子: schedule.every().monday.do(job) 这句代码作用就是就是单词意思,定时器会每个周一运行函数 job,怎么样是不是很简单。 3.实现一个进度条 1. from time import sleep 2. 3. def progress(percent=30): 4. left = width * percent // 100 5. right = width - left 6. print(' [', ']', 7. %', 8. sep='', flush=True) 9. 10. for i in range(101): 11. progress(i) 12. sleep(0.1) 展示效果 别卧槽了,赶紧快试试吧。 上面的代码中的 print(有几个有用的参数,sep )的作用是已什么为分隔符,默认是空格,这里设置为空串是为了让每个字符之间更紧凑,end 参数作用是已什么结尾,默认是回车换行符,这里为了实现进度条的效果,同样设置为空串。 还有最后一个参数 flush,该参数的作用主要是刷新, 默认 flush = False,不刷新,print(到 f 中的内容先存到内存中;) 而当 flush = True 时它会立即把内容刷新并输出。 4.优雅的打印嵌套类型的数据 大家应该都有印象,在打印 json 字符串或者字典的时候,打印出的一坨东西根本就没有一个层次关系,这里主要说的就是输出格式的问题。 import json 2. my_mapping = {0xc0ffee} 3. print(json.dumps(my_mapping, indent=4, sort_keys=True)) 大家可以自己试试只用 print(打印 my_mapping,和例子的这种打印方法。) 如果我们打印字典组成的列表呢,这个时候使用 json 的 dumps 方法肯定不行的,不过没关系,用标准库的 pprint(方法同样可以实现上面的方法) 1. import pprint 2. my_mapping = [{0xc0ffee}] 3. pprint.pprint(my_mapping,width=4) 5.功能简单的类使用 namedtuple 和 dataclass 的方式定义 有时候我们想实现一个类似类的功能,但是没有那么复杂的方法需要操作的时候,这个时候就可以考虑下下面两种方法了。 第一个,namedtuple 又称具名元组,带有名字的元组。 它作为 Python 标准库 collections 里的一个模块,可以实现一个类似类的一个功能。 1. from collections import namedtuple 2. 3. # 以前简单的类可以使用 namedtuple 实现。 4. Car = namedtuple('color mileage') 5. 6. my_car = Car(3812.4) 7. print(my_car.color) 8. print(my_car) 但是呢,所有属性需要提前定义好才能使用,比如想使用my_car.name,你就得把代码改成下面的样子。 1. from collections import namedtuple 2. 3. # 以前简单的类可以使用 namedtuple 实现。 4. Car = namedtuple('color mileage name') 5. 6. my_car = Car(Car: 4. color: str 5. mileage: float 6. 7. my_car = Car(3812.4) 8. print(my_car.color) 9. print(my_car) 6.f-string 的 !r,!a,!s f-string出现在Python3.6,作为当前最佳的拼接字符串的形式,看下 f-string 的结构 f ' &lt;text&gt; { &lt;expression&gt; &lt;optional : format specifier&gt; } &lt;text&gt; ... ' 其中'!s' 在表达式上调用str(),'!r' 调用表达式上的repr(),'!a' 调用表达式上的ascii() (1.默认情况下,f-string将使用str(),但如果包含转换标志,则可以确保它们使用repr () ! 1. class Comedian: 2. def __init__(self, first_name, last_name, age): 3. self.first_name = first_name 4. self.last_name = last_name 5. self.age = age 6. 7. def __str__(self): 8. return f"调用 1. &gt;&gt;&gt; new_comedian = Comedian("{new_comedian}" 3. 'Eric Idle is 74.' 4. 5. &gt;&gt;&gt; f'Eric Idle is 74.' 7. &gt;&gt;&gt; f'Eric Idle is 74. Surprise!'(2.!a的例子 1. &gt;&gt;&gt; a = 'some string' 2. &gt;&gt;&gt; f'{a!r}' 3. 等价于 1. &gt;&gt;&gt; f'{repr(a)}' 2. 在python3.8中已经实现上述功能,不过不再使用!d了改为了f"{a=}"的形式,看过这个视频的发现没有!d应该很懵逼. 7.f-string 里"="的应用 在 Python3.8 里有这样一个功能 1. a = 5 2. print(a=5 是不是很方便,不用你再使用f"a={a}"了。 8.海象运算符:=的是使用 1. a =6 2. if (b:=a+6: 3. print(b) 赋值的时候同时可以进行运算,和 Go 语言的赋值类似了。 代码的运行顺序,首先计算 a+1 得到值为 7,然后把 7 赋值给 b,到这里代码相当于下面这样了。 1. b =7 2. if b&gt;6: 3. print(b) 怎么样是不是简单了不少,不过这个功能 3.8 开始才能用哦。 总结 今天的内容就到这了,这些内容大多都是一些碎片化的知识,这里整理出来和大家分享一下。同时,这次小编也给大家准备了一批人工智能的学习资料,总共约300G,内容包括视频教程、课件、代码等,涵盖了python、机器学习、数据挖掘等11个部分,是很难得的学习资源。

Socket Programming in Python: Client, Server

Socket Programming in Python: Client, Server, and Peer Examples

list all functions in a Python module

To get the docs on all the functions at once, interactively. print(dir(os)) # show all functions for i in dir(module): print(i # list out one by one) The inspect module. Also see the pydoc module, the help() function in the interactive interpreter and the pydoc command-line tool which generates the documentation you are after. help(os)

Python Functions

Python Functions

Python Control Flow

Python Control Flow

Python elegant way to read lines of file into list

For most cases, to read lines of file to a list with open(fileName) as f: lineList = f.readlines() In this case, every element in the list contain a \n in the end the string, which would be extremely annoying in some cases. And there will be same problem if you use: lineList = list() with open(fileName) as f: for line in f: lineList.append(line) To overcome this, use: lineList = [line.rstrip('\n') for line in open(fileName)]

Call a function from another file in Python

If you have a file a.py and inside you have some functions: def b(): # Something return 1 def c(): # Something return 2 And you want to import them in z.py you have to write from a import b, c

Run JavaScript from Python

Js2Py Run JavaScript from Python Run JavaScript from Python

Python and Selenium Extract Local Storage

None of the high level programming languages invoke a browser instance, they request and extract pure HTML only. So if we want to access the browser's local storage when scraping a page, we need to invoke both a browser instance and leverage a JavaScript interpreter to read the local storage. Selenium is the best solution. A possible replacement for Selenium is PhantomJS, running a headless browser.

JaveScript to iterate over localStorage browser object

for (var i = 0; i < localStorage.length; i++){ key=localStorage.key(i); console.log(key+': '+localStorage.getItem(key)); }

Advanced script

As mentioned here a HTML5 featured browser should also implement Array.prototype.map. So script would be: Array.apply(0, new Array(localStorage.length)).map(function (o, i){ return localStorage.key(i)+':'+localStorage.getItem(localStorage.key(i)); })

Python with Selenium script for setting up and scraping local storage

from selenium import webdriver driver = webdriver.Firefox() url='http://www.w3schools.com/' driver.get(url) scriptArray="""localStorage.setItem("key1", 'new item'); localStorage.setItem("key2", 'second item'); return Array.apply(0, new Array(localStorage.length)).map(function (o, i) { return localStorage.getItem(localStorage.key(i)); })""" result = driver.execute_script(scriptArray) print(result)

Selenium

Python Web-scraping with Selenium
安装selenium库安装浏览器驱动
初始化浏览器对象
访问页面设置浏览器大小刷新页面前进后退
id定位
name定位class定位tag定位link定位partial定位xpath定位css定位find_element的By定位3.10. 多个元素
get_attribute获取属性
获取文本获取其他属性
输入文本
点击清除文本回车确认单选多选下拉框
Frame切换
选项卡切换
左键
右键双击拖拽悬停
强制等待
隐式等待显式等待
运行JavaScript
Cookie反屏蔽
Selenium 是一个用于 Web应用程序测试的工具。 Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。支持的浏览器包括IE(7, 8, 9, 10, 11),Mozilla FirefoxSafariGoogle ChromeOperaEdge等。 0. 准备工作 在开始后续功能演示之前,我们需要先安装Chrome浏览器并配置好ChromeDriver,当然也需要安装selenium库!

安装selenium库

pip install selenium

安装浏览器驱动

其实,有两种方式安装浏览器驱动:一种是常见的手动安装,另一种则是利用第三方库自动安装。 以下前提:大家都已经安装好了Chrome浏览器哈 手动安装 先查看本地Chrome浏览器版本:(两种方式均可) 在浏览器的地址栏键入Chrome://version,即可查看浏览器版本号 或者点击Chrome菜单 帮助关于Google Chrome,查看浏览器版本号 再选择对应版本号的驱动版本
下载地址:https://chromedriver.storage.googleapis.com/index.html
最后进行环境变量配置,也就是将对应的ChromeDriver的可执行文件chromedriver.exe文件拖到PythonScripts目录下。 注:当然也可以不这样做,但是在调用的时候指定chromedriver.exe绝对路径亦可。 自动安装 自动安装需要用到第三方库webdriver_manager,先安装这个库,然后调用对应的方法即可。 from selenium import webdriver from selenium.webdriver.common.keys import Keys from webdriver_manager.chrome import ChromeDriverManager browser = webdriver.Chrome(ChromeDriverManager().install()) browser.get('http://www.baidu.com') search = browser.find_element_by_id('kw') search.send_keys('python') search.send_keys(Keys.ENTER) # 关闭浏览器 browser.close() 在上述代码中,ChromeDriverManager().install()方法就是自动安装驱动的操作,它会自动获取当前浏览器的版本并去下载对应的驱动到本地。 ====== WebDriver manager ====== Current google-chrome version is 96.0.4664 Get LATEST chromedriver version for 96.0.4664 google-chrome There is no [win32] chromedriver for browser in cache Trying to download new driver from https://chromedriver.storage.googleapis.com/96.0.4664.45/chromedriver_win32.zip Driver has been saved in cache [C:\Users\Gdc\.wdm\drivers\chromedriver\win32\96.0.4664.45] 如果本地已经有该浏览器渠道,则会提示其已存在。 ====== WebDriver manager ====== Current google-chrome version is 96.0.4664 Get LATEST driver version for 96.0.4664 Driver [C:\Users\Gdc\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe] found in cache 搞定以上准备工作,我们就可以开始本文正式内容的学习啦~ 1. 基本用法 这节我们就从初始化浏览器对象、访问页面、设置浏览器大小、刷新页面和前进后退等基础操作。


初始化浏览器对象

在准备工作部分我们提到需要将浏览器渠道添加到环境变量或者指定绝对路径,前者可以直接初始化后者则需要进行指定。 from selenium import webdriver # 初始化浏览器为chrome浏览器 browser = webdriver.Chrome() # 指定绝对路径的方式 path = r'C:\Users\Gdc\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe' browser = webdriver.Chrome(path) # 关闭浏览器 browser.close() 初始化浏览器对象 可以看到以上是有界面的浏览器,我们还可以初始化浏览器为无界面的浏览器from selenium import webdriver # 无界面的浏览器 option = webdriver.ChromeOptions() option.add_argument("headless") browser = webdriver.Chrome(options=option) # 访问百度首页 browser.get(r'https://www.baidu.com/') # 截图预览 browser.get_screenshot_as_file('截图.png') # 关闭浏览器 browser.close() 截图 完成浏览器对象的初始化后并将其赋值给了browser对象,接下来我们就可以调用browser来执行各种方法模拟浏览器的操作了。

访问页面

进行页面访问使用的是get方法,传入参数为待访问页面的URL地址即可。 from selenium import webdriver # 初始化浏览器为chrome浏览器 browser = webdriver.Chrome() # 访问百度首页 browser.get(r'https://www.baidu.com/') # 关闭浏览器 browser.close()

设置浏览器大小

set_window_size()方法可以用来设置浏览器大小(就是分辨率),而maximize_window则是设置浏览器为全屏! from selenium import webdriver import time browser = webdriver.Chrome() # 设置浏览器大小:全屏 browser.maximize_window() browser.get(r'https://www.baidu.com') time.sleep(2) # 设置分辨率 500*500 browser.set_window_size(500,500) time.sleep(2) # 设置分辨率 1000*800 browser.set_window_size(1000,800) time.sleep(2) # 关闭浏览器 browser.close() 这里就不截图了,大家自行演示看效果哈~

刷新页面

刷新页面是我们在浏览器操作时很常用的操作,这里refresh()方法可以用来进行浏览器页面刷新。 from selenium import webdriver import time browser = webdriver.Chrome() # 设置浏览器全屏 browser.maximize_window() browser.get(r'https://www.baidu.com') time.sleep(2) try: # 刷新页面 browser.refresh() print('刷新页面') except Exception as e: print('刷新失败') # 关闭浏览器 browser.close() 大家也是自行演示看效果哈,同F5快捷键。

前进后退

前进后退也是我们在使用浏览器时非常常见的操作,这里forward()方法可以用来实现前进,back()可以用来实现后退。 from selenium import webdriver import time browser = webdriver.Chrome() # 设置浏览器全屏 browser.maximize_window() browser.get(r'https://www.baidu.com') time.sleep(2) # 打开淘宝页面 browser.get(r'https://www.taobao.com') time.sleep(2) # 后退到百度页面 browser.back() time.sleep(2) # 前进的淘宝页面 browser.forward() time.sleep(2) # 关闭浏览器 browser.close() 2. 获取页面基础属性 当我们用selenium打开某个页面,有一些基础属性如网页标题、网址、浏览器名称、页面源码等信息。 from selenium import webdriver browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') # 网页标题 print(browser.title) # 当前网址 print(browser.current_url) # 浏览器名称 print(browser.name) # 网页源码 print(browser.page_source) 输出如下: 百度一下,你就知道 https://www.baidu.com/ chrome <html><head><script async="" src="https://passport.baidu.com/passApi/js/wrapper.js?cdnversion=1640515789507&_=1640515789298"></script><meta http-equiv="Content-Type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta content="always" name="referrer"><meta name="theme-color"..." 需要注意的是,这里的页面源码我们就可以用正则表达式Bs4xpath以及pyquery等工具进行解析提取想要的信息了。 3. 定位页面元素 我们在实际使用浏览器的时候,很重要的操作有输入文本、点击确定等等。对此,Selenium提供了一系列的方法来方便我们实现以上操作。常说的8种定位页面元素的操作方式,我们一一演示一下! 我们以百度首页的搜索框节点为例,搜索python 搜索框 搜索框的html结构: <input id="kw" name="wd" class="s_ipt" value="" maxlength="255" autocomplete="off">


id定位

find_element_by_id()根据id属性获取,这里id属性是 kw from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 在搜索框输入 python browser.find_element_by_id('kw').send_keys('python') time.sleep(2) # 关闭浏览器 browser.close()

name定位

find_element_by_name()根据name属性获取,这里name属性是 wd from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 在搜索框输入 python browser.find_element_by_name('wd').send_keys('python') time.sleep(2) # 关闭浏览器 browser.close()

class定位

find_element_by_class_name()根据class属性获取,这里class属性是s_ipt from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 在搜索框输入 python browser.find_element_by_class_name('s_ipt').send_keys('python') time.sleep(2) # 关闭浏览器 browser.close()

tag定位

我们知道HTML是通过tag来定义功能的,比如input是输入,table是表格等等。每个元素其实就是一个tag,一个tag往往用来定义一类功能,我们查看百度首页的html代码,可以看到有很多同类tag,所以其实很难通过tag去区分不同的元素。 find_element_by_tag_name() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 在搜索框输入 python browser.find_element_by_tag_name('input').send_keys('python') time.sleep(2) # 关闭浏览器 browser.close() 由于存在多个input,以上代码会报错。

link定位

这种方法顾名思义就是用来定位文本链接的,比如百度首页上方的分类模块链接。 find_element_by_link_text() 以新闻为例 from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 点击新闻 链接 browser.find_element_by_link_text('新闻').click() time.sleep(2) # 关闭浏览器全部页面 browser.quit()

partial定位

有时候一个超链接的文本很长,我们如果全部输入,既麻烦,又显得代码很不美观,这时候我们就可以只截取一部分字符串,用这种方法模糊匹配了。 find_element_by_partial_link_text() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 点击新闻 链接 browser.find_element_by_partial_link_text('闻').click() time.sleep(2) # 关闭浏览器全部页面 browser.quit()

xpath定位

前面介绍的几种定位方法都是在理想状态下,有一定使用范围的,那就是:在当前页面中,每个元素都有一个唯一idnameclass超链接文本的属性,那么我们就可以通过这个唯一的属性值来定位他们。 但是在实际工作中并非有这么美好,那么这个时候我们就只能通过xpath或者css来定位了。 find_element_by_xpath() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 在搜索框输入 python browser.find_element_by_xpath("//*[@id='kw']").send_keys('python') time.sleep(2) # 关闭浏览器 browser.close()

css定位

这种方法相对xpath要简洁些,定位速度也要快些。 find_element_by_css_selector() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 在搜索框输入 python browser.find_element_by_css_selector('#kw').send_keys('python') time.sleep(2) # 关闭浏览器 browser.close()

find_element的By定位

除了上述的8种定位方法,Selenium还提供了一个通用的方法find_element(),这个方法有两个参数:定位方式和定位值。 # 使用前先导入By类 from selenium.webdriver.common.by import By 以上的操作可以等同于以下: browser.find_element(By.ID,'kw') browser.find_element(By.NAME,'wd') browser.find_element(By.CLASS_NAME,'s_ipt') browser.find_element(By.TAG_NAME,'input') browser.find_element(By.LINK_TEXT,'新闻') browser.find_element(By.PARTIAL_LINK_TEXT,'闻') browser.find_element(By.XPATH,'//*[@id="kw"]') browser.find_element(By.CSS_SELECTOR,'#kw')

3.10. 多个元素

如果定位的目标元素在网页中不止一个,那么则需要用到find_elements,得到的结果会是列表形式。简单来说,就是element后面多了复数标识s,其他操作一致。 4. 获取页面元素属性 既然我们有很多方式来定位页面的元素,那么接下来就可以考虑获取以下元素的属性了,尤其是用Selenium进行网络爬虫的时候。


get_attribute获取属性

以百度首页的logo为例,获取logo相关属性 <img hidefocus="true" id="s_lg_img" class="index-logo-src" src="//www.baidu.com/img/PCtm_d9c8750bed0b3c7d089fa7d55720d6cf.png" width="270" height="129" onerror="this.src='//www.baidu.com/img/flexible/logo/pc/index.png';this.onerror=null;" usemap="#mp"> 获取logo的图片地址 from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') logo = browser.find_element_by_class_name('index-logo-src') print(logo) print(logo.get_attribute('src')) # 关闭浏览器 browser.close() 输出: <selenium.webdriver.remote.webelement.WebElement (session="e95b18c43a330745af019e0041f0a8a4", element="7dad5fc0-610b-45b6-b543-9e725ee6cc5d")> https://www.baidu.com/img/PCtm_d9c8750bed0b3c7d089fa7d55720d6cf.png

获取文本

以热榜为例,获取热榜文本和链接 <a class="title-content tag-width c-link c-font-medium c-line-clamp1" href="https://www.baidu.com/s?cl=3&tn=baidutop10&fr=top1000&wd=各地贯彻十九届六中全会精神纪实&rsv_idx=2&rsv_dl=fyb_n_homepage&sa=fyb_n_homepage&hisfilter=1" target="_blank"><span class="title-content-index c-index-single c-index-single-hot1">1</span><span class="title-content-title">各地贯彻十九届六中全会精神纪实</span></a> 获取热榜的文本,用的是text属性,直接调用即可 from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') logo = browser.find_element_by_css_selector('#hotsearch-content-wrapper > li:nth-child(1) > a') print(logo.text) print(logo.get_attribute('href')) # 关闭浏览器 browser.close() 输出: 1各地贯彻十九届六中全会精神纪实 https://www.baidu.com/s?cl=3&tn=baidutop10&fr=top1000&wd=各地贯彻十九届六中全会精神纪实&rsv_idx=2&rsv_dl=fyb_n_homepage&sa=fyb_n_homepage&hisfilter=1

获取其他属性

除了属性和文本值外,还有id、位置、标签名和大小等属性。 from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') logo = browser.find_element_by_class_name('index-logo-src') print(logo.id) print(logo.location) print(logo.tag_name) print(logo.size) # 关闭浏览器 browser.close() 输出: 6af39c9b-70e8-4033-8a74-7201ae09d540 {'x': 490, 'y': 46} img {'height': 129, 'width': 270} 5. 页面交互操作 页面交互就是在浏览器的各种操作,比如上面演示过的输入文本、点击链接等等,还有像清除文本、回车确认、单选框与多选框选中等。


输入文本

其实,在之前的小节中我们有用过此操作。 send_keys() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 定位搜索框 input = browser.find_element_by_class_name('s_ipt') # 输入python input.send_keys('python') time.sleep(2) # 关闭浏览器 browser.close()

点击

同样,我们也用过这个点击操作。 click() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 选中新闻按钮 click = browser.find_element_by_link_text('新闻') # 点击之 click.click() time.sleep(2) # 关闭浏览器全部页面 browser.quit()

清除文本

既然有输入,这里也就有清除文本啦。 clear() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 定位搜索框 input = browser.find_element_by_class_name('s_ipt') # 输入python input.send_keys('python') time.sleep(2) # 清除python input.clear() time.sleep(2) # 关闭浏览器 browser.close()

回车确认

比如,在搜索框输入文本python,然后回车就出查询操作结果的情况。 submit() from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 定位搜索框 input = browser.find_element_by_class_name('s_ipt') # 输入python input.send_keys('python') time.sleep(2) # 回车查询 input.submit() time.sleep(5) # 关闭浏览器 browser.close()

单选

单选比较好操作,先定位需要单选的某个元素,然后点击一下即可。

多选

多选好像也比较容易,依次定位需要选择的元素,点击即可。

下拉框

下拉框的操作相对复杂一些,需要用到Select模块。 先导入该类 from selenium.webdriver.support.select import Selectselect模块中有以下定位方法
'''1、三种选择某一选项项的方法''' select_by_index() # 通过索引定位;注意:index索引是从“0”开始。 select_by_value() # 通过value值定位,value标签的属性值。 select_by_visible_text() # 通过文本值定位,即显示在下拉框的值。 '''2、三种返回options信息的方法''' options # 返回select元素所有的options all_selected_options # 返回select元素中所有已选中的选项 first_selected_options # 返回select元素中选中的第一个选项 '''3、四种取消选中项的方法''' deselect_all # 取消全部的已选择项 deselect_by_index # 取消已选中的索引项 deselect_by_value # 取消已选中的value值 deselect_by_visible_text # 取消已选中的文本值
我们来进行演示一波,由于暂时没找到合适的网页,我这边写了一个简单的网页本地测试(文件存为 帅哥.html) <html> <body> <form> <select name="帅哥"> <option value="才哥">才哥</option> <option value="小明" selected="">小明</option> <option value="小华">小华</option> <option value="草儿">小草</option> </select> </form> </body> </html> 然后,再演示下拉框的不同选择的方式 from selenium import webdriver from selenium.webdriver.support.select import Select import time url = 'file:///C:/Users/Gdc/Desktop/帅哥.html' browser = webdriver.Chrome() browser.get(url) time.sleep(2) # 根据索引选择 Select(browser.find_element_by_name("帅哥")).select_by_index("2") time.sleep(2) # 根据value值选择 Select(browser.find_element_by_name("帅哥")).select_by_value("草儿") time.sleep(2) # 根据文本值选择 Select(browser.find_element_by_name("帅哥")).select_by_visible_text("才哥") time.sleep(2) # 关闭浏览器 browser.close() 下拉框 6. 多窗口切换 比如同一个页面的不同子页面的节点元素获取操作,不同选项卡之间的切换以及不同浏览器窗口之间的切换操作等等。


Frame切换

Selenium打开一个页面之后,默认是在父页面进行操作,此时如果这个页面还有子页面,想要获取子页面的节点元素信息则需要切换到子页面进行擦走,这时候switch_to.frame()就来了。如果想回到父页面,用switch_to.parent_frame()即可。

选项卡切换

我们在访问网页的时候会打开很多个页面,在Selenium中提供了一些方法方便我们对这些页面进行操作。
current_window_handle:获取当前窗口的句柄。 window_handles:返回当前浏览器的所有窗口的句柄。 switch_to_window():用于切换到对应的窗口。
from selenium import webdriver import time browser = webdriver.Chrome() # 打开百度 browser.get('http://www.baidu.com') # 新建一个选项卡 browser.execute_script('window.open()') print(browser.window_handles) # 跳转到第二个选项卡并打开知乎 browser.switch_to.window(browser.window_handles[1]) browser.get('http://www.zhihu.com') # 回到第一个选项卡并打开淘宝(原来的百度页面改为了淘宝) time.sleep(2) browser.switch_to.window(browser.window_handles[0]) browser.get('http://www.taobao.com') 7. 模拟鼠标操作 既然是模拟浏览器操作,自然也就需要能模拟鼠标的一些操作了,这里需要导入ActionChains 类。 from selenium.webdriver.common.action_chains import ActionChains


左键

这个其实就是页面交互操作中的点击click()操作。

右键

context_click() from selenium.webdriver.common.action_chains import ActionChains from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 定位到要右击的元素,这里选的新闻链接 right_click = browser.find_element_by_link_text('新闻') # 执行鼠标右键操作 ActionChains(browser).context_click(right_click).perform() time.sleep(2) # 关闭浏览器 browser.close() 在上述操作中
ActionChains(browser):调用ActionChains()类,并将浏览器驱动browser作为参数传入 context_click(right_click):模拟鼠标双击,需要传入指定元素定位作为参数 perform():执行ActionChains()中储存的所有操作,可以看做是执行之前一系列的操作

双击

double_click() from selenium.webdriver.common.action_chains import ActionChains from selenium import webdriver import time browser = webdriver.Chrome() browser.get(r'https://www.baidu.com') time.sleep(2) # 定位到要双击的元素 double_click = browser.find_element_by_css_selector('#bottom_layer > div > p:nth-child(8) > span') # 双击 ActionChains(browser).double_click(double_click).perform() time.sleep(15) # 关闭浏览器 browser.close()

拖拽

drag_and_drop(source,target)拖拽操作嘛,开始位置和结束位置需要被指定,这个常用于滑块类验证码的操作之类。 我们以菜鸟教程的一个案例来进行演示
https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable
from selenium.webdriver.common.action_chains import ActionChains from selenium import webdriver import time browser = webdriver.Chrome() url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable' browser.get(url) time.sleep(2) browser.switch_to.frame('iframeResult') # 开始位置 source = browser.find_element_by_css_selector("#draggable") # 结束位置 target = browser.find_element_by_css_selector("#droppable") # 执行元素的拖放操作 actions = ActionChains(browser) actions.drag_and_drop(source, target) actions.perform() # 拖拽 time.sleep(15) # 关闭浏览器 browser.close() 拖拽

悬停

move_to_element() from selenium.webdriver.common.action_chains import ActionChains from selenium import webdriver import time browser = webdriver.Chrome() url = 'https://www.baidu.com' browser.get(url) time.sleep(2) # 定位悬停的位置 move = browser.find_element_by_css_selector("#form > span.bg.s_ipt_wr.new-pmd.quickdelete-wrap > span.soutu-btn") # 悬停操作 ActionChains(browser).move_to_element(move).perform() time.sleep(5) # 关闭浏览器 browser.close() 悬停效果 8. 模拟键盘操作 selenium中的Keys()类提供了大部分的键盘操作方法,通过send_keys()方法来模拟键盘上的按键。 引入Keysfrom selenium.webdriver.common.keys import Keys 常见的键盘操作
send_keys(Keys.BACK_SPACE):删除键(BackSpace) send_keys(Keys.SPACE):空格键(Space) send_keys(Keys.TAB):制表键(TAB) send_keys(Keys.ESCAPE):回退键(ESCAPE) send_keys(Keys.ENTER):回车键(ENTER) send_keys(Keys.CONTRL,'a'):全选(Ctrl+A) send_keys(Keys.CONTRL,'c'):复制(Ctrl+C) send_keys(Keys.CONTRL,'x'):剪切(Ctrl+X) send_keys(Keys.CONTRL,'v'):粘贴(Ctrl+V) send_keys(Keys.F1):键盘F1 ..... send_keys(Keys.F12):键盘F12
实例操作演示: 定位需要操作的元素,然后操作即可! from selenium.webdriver.common.keys import Keys from selenium import webdriver import time browser = webdriver.Chrome() url = 'https://www.baidu.com' browser.get(url) time.sleep(2) # 定位搜索框 input = browser.find_element_by_class_name('s_ipt') # 输入python input.send_keys('python') time.sleep(2) # 回车 input.send_keys(Keys.ENTER) time.sleep(5) # 关闭浏览器 browser.close() 9. 延时等待 如果遇到使用ajax加载的网页,页面元素可能不是同时加载出来的,这个时候尝试在get方法执行完成时获取网页源代码可能并非浏览器完全加载完成的页面。所以,这种情况下需要设置延时等待一定时间,确保全部节点都加载出来。 三种方式可以来玩玩:强制等待、隐式等待和显式等待


强制等待

就很简单了,直接time.sleep(n)强制等待n秒,在执行get方法之后执行。

隐式等待

implicitly_wait()设置等待时间,如果到时间有元素节点没有加载出来,就会抛出异常。 from selenium import webdriver browser = webdriver.Chrome() # 隐式等待,等待时间10秒 browser.implicitly_wait(10) browser.get('https://www.baidu.com') print(browser.current_url) print(browser.title) # 关闭浏览器 browser.close()

显式等待

设置一个等待时间和一个条件,在规定时间内,每隔一段时间查看下条件是否成立,如果成立那么程序就继续执行,否则就抛出一个超时异常。 from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By import time browser = webdriver.Chrome() browser.get('https://www.baidu.com') # 设置等待时间10s wait = WebDriverWait(browser, 10) # 设置判断条件:等待id='kw'的元素加载完成 input = wait.until(EC.presence_of_element_located((By.ID, 'kw'))) # 在关键词输入:关键词 input.send_keys('Python') # 关闭浏览器 time.sleep(2) browser.close() WebDriverWait的参数说明
WebDriverWait(driver,timeout,poll_frequency=0.5,ignored_exceptions=None) driver: 浏览器驱动 timeout: 超时时间,等待的最长时间(同时要考虑隐性等待时间) poll_frequency: 每次检测的间隔时间,默认是0.5秒 ignored_exceptions:超时后的异常信息,默认情况下抛出NoSuchElementException异常 until(method,message='') method: 在等待期间,每隔一段时间调用这个传入的方法,直到返回值不是False message: 如果超时,抛出TimeoutException,将message传入异常 until_not(method,message='') until_notuntil相反,until是当某元素出现或什么条件成立则继续执行,until_not是当某元素消失或什么条件不成立则继续执行,参数也相同。
其他等待条件 from selenium.webdriver.support import expected_conditions as EC # 判断标题是否和预期的一致 title_is # 判断标题中是否包含预期的字符串 title_contains # 判断指定元素是否加载出来 presence_of_element_located # 判断所有元素是否加载完成 presence_of_all_elements_located # 判断某个元素是否可见. 可见代表元素非隐藏,并且元素的宽和高都不等于0,传入参数是元组类型的locator visibility_of_element_located # 判断元素是否可见,传入参数是定位后的元素WebElement visibility_of # 判断某个元素是否不可见,或是否不存在于DOM树 invisibility_of_element_located # 判断元素的 text 是否包含预期字符串 text_to_be_present_in_element # 判断元素的 value 是否包含预期字符串 text_to_be_present_in_element_value #判断frame是否可切入,可传入locator元组或者直接传入定位方式:id、name、index或WebElement frame_to_be_available_and_switch_to_it #判断是否有alert出现 alert_is_present #判断元素是否可点击 element_to_be_clickable # 判断元素是否被选中,一般用在下拉列表,传入WebElement对象 element_to_be_selected # 判断元素是否被选中 element_located_to_be_selected # 判断元素的选中状态是否和预期一致,传入参数:定位后的元素,相等返回True,否则返回False element_selection_state_to_be # 判断元素的选中状态是否和预期一致,传入参数:元素的定位,相等返回True,否则返回False element_located_selection_state_to_be #判断一个元素是否仍在DOM中,传入WebElement对象,可以判断页面是否刷新了 staleness_of 10. 其他 补充一些


运行JavaScript

还有一些操作,比如下拉进度条,模拟javaScript,使用execute_script方法来实现。 from selenium import webdriver browser = webdriver.Chrome() # 知乎发现页 browser.get('https://www.zhihu.com/explore') browser.execute_script('window.scrollTo(0, document.body.scrollHeight)') browser.execute_script('alert("To Bottom")')

Cookie

selenium使用过程中,还可以很方便对Cookie进行获取、添加与删除等操作。 from selenium import webdriver browser = webdriver.Chrome() # 知乎发现页 browser.get('https://www.zhihu.com/explore') # 获取cookie print(f'Cookies的值:{browser.get_cookies()}') # 添加cookie browser.add_cookie({'name':'才哥', 'value':'帅哥'}) print(f'添加后Cookies的值:{browser.get_cookies()}') # 删除cookie browser.delete_all_cookies() print(f'删除后Cookies的值:{browser.get_cookies()}') 输出: Cookies的值:[{'domain': '.zhihu.com', 'httpOnly': False, 'name': 'Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49', 'path': '/', 'secure': False, 'value': '1640537860'}, {'domain': '.zhihu.com', ...] 添加后Cookies的值:[{'domain': 'www.zhihu.com', 'httpOnly': False, 'name': '才哥', 'path': '/', 'secure': True, 'value': '帅哥'}, {'domain': '.zhihu.com', 'httpOnly': False, 'name': 'Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49', 'path': '/', 'secure': False, 'value': '1640537860'}, {'domain': '.zhihu.com',...] 删除后Cookies的值:[]

反屏蔽

发现美团直接给Selenium给屏蔽了,不知道怎么搞!!

Python bindings alternative to Python+Selenium

Some might argue Selenium is inefficient for only local storage extracting. If you think Selenium is too bulky, you might want to try a Python binding with a development framework for desktop, ex. PyQt.

execute_script

Python doesn't provide a way to directly read/write the local storage, but it can be done with execute_script. driver.execute_script("window.localStorage;") or: from selenium import webdriver wd = webdriver.Firefox() wd.get("http://localhost/foo/bar") wd.execute_script("return localStorage.getItem('foo')") or: driver.execute_script("window.localStorage.setItem('key','value');"); driver.execute_script("window.localStorage.getItem('key');"); or define class: class LocalStorage: def __init__(self, driver) : self.driver = driver def __len__(self): return self.driver.execute_script("return window.localStorage.length;") def items(self) : return self.driver.execute_script( \ "var ls = window.localStorage, items = {}; " \ "for (var i = 0, k; i < ls.length; ++i) " \ " items[k = ls.key(i)] = ls.getItem(k); " \ "return items; ") def keys(self) : return self.driver.execute_script( \ "var ls = window.localStorage, keys = []; " \ "for (var i = 0; i < ls.length; ++i) " \ " keys[i] = ls.key(i); " \ "return keys; ") def get(self, key): return self.driver.execute_script("return window.localStorage.getItem(arguments[0]);", key) def set(self, key, value): self.driver.execute_script("window.localStorage.setItem(arguments[0], arguments[1]);", key, value) def has(self, key): return key in self.keys() def remove(self, key): self.driver.execute_script("window.localStorage.removeItem(arguments[0]);", key) def clear(self): self.driver.execute_script("window.localStorage.clear();") def __getitem__(self, key) : value = self.get(key) if value is None : raise KeyError(key) return value def __setitem__(self, key, value): self.set(key, value) def __contains__(self, key): return key in self.keys() def __iter__(self): return self.items().__iter__() def __repr__(self): return self.items().__str__() Usage example: # get the local storage storage = LocalStorage(driver) # set an item storage["mykey"] = 1234 storage.set("mykey2", 5678) # get an item print(storage["mykey"]) # raises a KeyError if the key is missing print(storage.get("mykey")) # returns None if the key is missing # delete an item storage.remove("mykey") # iterate items for key, value in storage.items(): print("%s: %s" % (key, value)) # delete items storage.clear()

to list all python packages installed

As of version 1.3 of pip you can now use pip list Using help function help("modules") using python-pip pip freeze pip freeze will output a list of installed packages and their versions. It also allows you to write those packages to a file that can later be used to set up a new environment.

20 Python libraries you can't live without

Requests Scrapy wxPython Pillow SQLAlchemy BeautifulSoup Twisted NumPy SciPy matplotlib Pygame Pyglet pyQT pyGtk Scapy pywin32 nltk nose SymPy IPython 1. Requests. 2. Scrapy. must have library in webscraping 3. wxPython. A gui toolkit for python. I have primarily used it in place of tkinter. 4. Pillow. A friendly fork of PIL (Python Imaging Library). It is more user friendly than PIL and is a must have for anyone who works with images. 5. SQLAlchemy. A database library. Many love it and many hate it. 6. BeautifulSoup. I know it’s slow but this xml and html parsing library is very useful for beginners. 7. Twisted. The most important tool for any network application developer. It has a very beautiful api. 8. NumPy. provides advance math functionalities to python. 9. SciPy. When we talk about NumPy then we have to talk about scipy. It is a library of algorithms and mathematical tools for python and has caused many scientists to switch from ruby to python. 10. matplotlib. A numerical plotting library. It is very useful for any data scientist or any data analyzer. 11. Pygame. game development. 12. Pyglet. A 3d animation and game creation engine. This is the engine in which the famous python port of minecraft was made 13. pyQT. A GUI toolkit for python 14. pyGtk. Another python GUI library 15. Scapy. A packet sniffer and analyzer for python made in python. 16. pywin32. A python library which provides some useful methods and classes for interacting with windows. 17. nltk. Natural Language Toolkit – I realize most people won’t be using this one, but it’s generic enough. It is a very useful library if you want to manipulate strings. But it’s capacity is beyond that. Do check it out. 18. nose. A testing framework for python. It is used by millions of python developers. It is a must have if you do test driven development. 19. SymPy. SymPy can do algebraic evaluation, differentiation, expansion, complex numbers, etc. It is contained in a pure Python distribution. 20. IPython. It is a python prompt on steroids. It has completion, history, shell capabilities, and a lot more. Make sure that you take a look at it. Installed Python packages: IPython brain_curses lazy_object_proxy sqlite3 PdbSublimeTextSupport brain_dateutil lesscpy sre_compile PyInstaller brain_fstrings lib2to3 sre_constants PyQt5 brain_functools libfuturize sre_parse Radiobutton brain_gi libpasteurize ssl __future__ brain_hashlib linecache sspi _ast brain_http lineedit sspicon _asyncio brain_io locale stat _asyncio_d brain_mechanize logging statistics _bisect brain_multiprocessing lzma storemagic _blake2 brain_namedtuple_enum macpath string _bootlocale brain_nose macurl2path stringprep _bz2 brain_numpy mailbox struct _bz2_d brain_pkg_resources mailcap subprocess _codecs brain_pytest markupsafe sunau _codecs_cn brain_qt marshal symbol _codecs_hk brain_random math sympy _codecs_iso2022 brain_re matplotlib sympyprinting _codecs_jp brain_six mccabe symtable _codecs_kr brain_ssl mimetypes sys _codecs_tw brain_subprocess mistune sysconfig _collections brain_threading mmap tabnanny _collections_abc brain_typing mmapfile tarfile _compat_pickle brain_uuid mmsystem telnetlib _compression builtins modulefinder tempfile _csv bz2 more_itertools tensorflow _ctypes cProfile mpmath terminado _ctypes_d cachetools msilib test _ctypes_test calendar msvcrt testpath _ctypes_test_d certifi multiprocessing tests _datetime cgi nbconvert textwrap _decimal cgitb nbformat this _decimal_d chardet netbios threading _dummy_thread chunk netrc time _elementtree click nntplib timeit _elementtree_d cmath nose timer _findvs cmd notebook tkinter _functools code nt token _hashlib codecs ntpath tokenize _hashlib_d codeop ntsecuritycon toml _heapq collections nturl2path tornado _imp colorama numbers trace _io colorsys numpy traceback _json commctrl odbc tracemalloc _locale compileall opcode traitlets _lsprof concurrent operator tty _lzma configparser optparse turtle _lzma_d contextlib ordlookup turtledemo _markupbase copy os typed_ast _md5 copyreg pandas types _msi crypt pandocfilters typing _msi_d csv parser unicodedata _multibytecodec ctypes parso unicodedata_d _multiprocessing curses past unittest _multiprocessing_d cycler pathlib uritemplate _opcode cythonmagic pdb urllib _operator datetime pefile urllib3 _osx_support dateutil perfmon uu _overlapped dbi peutils uuid _overlapped_d dbm pickle venv _pickle dde pickleshare warnings _pydecimal decimal pickletools wave _pyio decorator pip wcwidth _pyrsistent_version difflib pipes weakref _random dis pkg_resources webbrowser _sha1 distutils pkgutil webencodings _sha256 doctest platform wheel _sha3 docutils plistlib widgetsnbextension _sha512 dotenv ply win2kras _signal dummy_threading poplib win32api _sitebuiltins easy_install posixpath win32clipboard _socket email pprint( win32com) _socket_d encodings profile win32con _sqlite3 ensurepip progressbar win32console _sqlite3_d entrypoints prometheus_client win32cred _sre enum prompt_toolkit win32crypt _ssl errno pstats win32cryptcon _ssl_d external pty win32ctypes _stat faulthandler py_compile win32event _string filecmp pyasn1 win32evtlog _strptime fileinput pyasn1_modules win32evtlogutil _struct fnmatch pyclbr win32file _symtable formatter pydoc win32gui _testbuffer fractions pydoc_data win32gui_struct _testbuffer_d ftplib pyexpat win32help _testcapi functools pyexpat_d win32inet _testcapi_d future pygments win32inetcon _testconsole garden pylab win32job _testconsole_d gc pylint win32lz _testimportmultiple genericpath pymysql win32net _testimportmultiple_d getopt pyparsing win32netcon _testmultiphase getpass pyqt5_tools win32pdh _testmultiphase_d gettext pyrsistent win32pdhquery _thread glob pysrt win32pdhutil _threading_local google_auth_httplib2 pythoncom win32pipe _tkinter googleapiclient pytz win32print _tkinter_d gzip pywin win32process _tracemalloc hashlib pywin32_testutil win32profile _warnings heapq pywintypes win32ras _weakref hmac qtconsole win32rcparser _weakrefset html queue win32security _win32sysloader html5lib quopri win32service _winapi http radian win32serviceutil _winxptheme httplib2 random win32timezone abc idlelib rasutil win32trace adodbapi idna rchitect win32traceutil afxres imaplib re win32transaction aifc imghdr regcheck win32ts altgraph imp regutil win32ui antigravity importlib reprlib win32uiole apiclient importlib_metadata requests win32verstamp appdirs inspect rlcompleter win32wnet argparse inventryList rmagic winerror array io rsa winioctlcon ast ipaddress runpy winnt astroid ipykernel sched winperf asynchat ipykernel_launcher scipy winpty asyncio ipython_genutils secrets winreg asyncore ipywidgets select winsound atexit isapi select_d winsound_d attr isort selectors winxpgui audioop itertools selenium winxptheme autoreload jedi send2trash wrapt autosub jinja2 servicemanager wsgiref base64 json setuptools xdrlib bdb json5 shelve xml binascii jsonschema shlex xmlrpc binhex jupyter shutil xxsubtype bisect jupyter_client signal zipapp black jupyter_console simplegeneric zipfile blackd jupyter_core site zipimport bleach jupyterlab six zipp blib2to3 jupyterlab_server smtpd zlib brain_argparse jupyterthemes smtplib zmq brain_attrs keyword sndhdr brain_builtin_inference kivy socket brain_collections kivy_deps socketserver wxpython

check version: python --version

first Django app

first Django app https://docs.djangoproject.com/en/3.0/intro/tutorial01/ Writing your first Django app, part 1 Check Django is installed $ python -m django --version Install Django $ pip install Django Creat project cd into a directory where you’d like to store your code $ django-admin startproject mysite startproject created: mysite/ manage.py mysite/ __init__.py settings.py urls.py wsgi.py manage.py: A command-line utility that lets you interact with this Django project in various ways. You can read all the details about manage.py in django-admin and manage.py. The inner mysite/ directory is the actual Python package for your project. Its name is the Python package name you’ll need to use to import anything inside it (e.g. mysite.urls). mysite/__init__.py: An empty file that tells Python that this directory should be considered a Python package. If you’re a Python beginner, read more about packages in the official Python docs. mysite/settings.py: Settings/configuration for this Django project. Django settings will tell you all about how settings work. mysite/urls.py: The URL declarations for this Django project; a “table of contents” of your Django-powered site. You can read more about URLs in URL dispatcher. mysite/wsgi.py: An entry-point for WSGI-compatible web servers to serve your project. Change into the outer mysite directory and run the following commands: $ python manage.py runserver The Django development server started. Visit http://127.0.0.1:8000/ with your Web browser to see a “Congratulations!” page! Changing the port $ python manage.py runserver 8080 To listen on all available public IPs (which is useful if you are running Vagrant or want to show off your work on other computers on the network), use: $ python manage.py runserver 0:8000 0 is a shortcut for 0.0.0.0. To create an app, type this: $ python manage.py startapp polls Directory polls created: polls/ __init__.py admin.py apps.py migrations/ __init__.py models.py tests.py views.py Write the first view: polls/views.py from django.http import HttpResponse def index(request): return HttpResponse("Hello, world. You're at the polls index.") To call the view, we need to map it to a URL - and for this we need a URLconf. To create a URLconf in the polls directory, create a file called urls.py. polls/urls.py from django.urls import path from . import views urlpatterns = [ path('', views.index, name='index'), ] The next step is to point the root URLconf at the polls.urls module. In mysite/urls.py, add an import for django.urls.include and insert an include() in the urlpatterns list, so you have: mysite/urls.py from django.contrib import admin from django.urls import include, path urlpatterns = [ path('polls/', include('polls.urls')), path('admin/', admin.site.urls), ] It turns out I was confused because of the multiple directories named "mysite". I wrongly created a urls.py file in the root "mysite" directory (which contains "manage.py"), then pasted in the code from the website. To correct it I deleted this file, went into the mysite/mysite directory (which contains "settings.py"), modified the existing "urls.py" file, and replaced the code with the tutorial code. Guessing on the basis of whatever little information provided in the question, i think you might have forget to add the following import in your urls.py file. from django.conf.urls import include

Logging in Python

Logging in Python # purpose of logging: record progress and problems # 5 levels of logging: notset, debug, info, warning, error, critical # 0 , 10, 20, 30, 40, 50 import logging dir(logging) # check what is inside import math # create log format to show more details # LOG_FORMAT = "%(Levelname)s %(asctime)s - %(message)s" LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s" # create and configure logger logging.basicConfig(filename = "C:\\Users\\User\\Desktop\\logfile.txt", level=logging.DEBUG, # this set the level to record format = LOG_FORMAT, # this set the output msg format filemode= "w") # this starts a blank file logger = logging.getLogger() # create logger object # test the logger # logger.debug("harmless message.") # logger.info("just some message.") # logger.warning("warning message.") # logger.error("error message.") # logger.critical("thecritical message.") # print(logger.level) def quadratic(a, b, c): """ return the quadratic solution ax^2 + bx + c =0. """ logger.info("quadratic({0},{1},{2})".format(a, b, c)) # compute the discriminant logger.debug("# compute the discriminant") disc = b**2 - 4*a*c # compute the two roots logger.debug("# compute the two roots") root1 = (-b + math.sqrt(disc)) / (2*a) root2 = (-b - math.sqrt(disc)) / (2*a) # return the roots logger.debug("# return the roots") return (root1, root2) roots = quadratic(1,0,1) print(roots)

Python Pandas Tutorial

Python Pandas SciPy Tutorial

Edge Detection

Canny Edge Detection in OpenCV

11 个最佳的 Python 编译器和解释器

大多数极客认为 Python 是解释性语言,但它也存在编译过程。 编译部分在代码执行时完成,并被删除。 然后编译内容被转换为字节码。 通过机器和操作系统进一步扩展到 Python 虚拟机。 本文介绍了适用于 Python 程序员的 11 种最佳的 Python 编译器和解释器。 最好的 Python 编译器和解释器 1.Brython Brython 是一种流行的 Python 编译器,可将 Python 转换为 Javascript 代码。 它提供对所有 Web 浏览器(包括一种手机 Web 浏览器)的支持。 它还支持最新的 Html5/CSS3 规范,可以使用流行的 CSS 框架,如 BootStrap3 和 LESS。 https://brython.info 2. Pyjs Pyjs 是一个丰富的 Internet 应用程序框架,也是一种轻量级的 Python 编译器,可以从 Web 浏览器直接执行 Python 脚本,可以从浏览器的 JS 控制台执行程序。 它是从 Python 到 Javascript 的编译器,可以使代码在 Web 浏览器上运行。 它带有 Ajax 框架和 Widget Set API。 http://pyjs.org 3. WinPython 它是为 Windows 操作系统设计的。 它有一些 CPython 的特性。 它预装了一些针对数据科学和机器学习的流行库,例如 Numpy、Pandas 和 Scipy。 它带有 C/C++ 编译器,大多数时候不会用到。 除此之外,它只有 Python 编译器,没有其它包。 https://winpython.github.io 4.Skulpt Skulpt 是 Python 的浏览器版实现,可以被添加到 HTML 代码中。 此 Python 编译器使用 Javascript 编写,在客户端运行代码,无需其它插件、加工或服务器支持。 Skulpt 解释器通过导入方式,来执行保存在网站上的 .py 文件中的代码。 https://skulpt.org 5.Shed Skin 该编译器将 Python 标准库模块编译为 C++,它将静态类型的 Python 程序转换为很受限的优化的 C++ 代码。 通过将其内置的 Python 数据类型再次实现为自己的类集(可以用 C++ 高效实现),可以提高性能。 https://en.wikipedia.org/wiki/Shed_Skin 6.Active Python 这是用于 Windows、Linux 和 Mac Os 的 Python 发行版,有免费的社区版。 它支持在许多平台安装,某些不被 Python-like 的 AIX 支持的平台,它也支持。 它提供了比 Python 更多的兼容性。 https://www.activestate.com/products/activepython 7.Transcrypt 它是一种流行的将 Python 代码编译为简单易读的 Java 代码的编译器。 它是一个轻量级的 Python 编译器,支持对矩阵和向量运算进行切片。 Transcrypt 也可以在 Node.js 上运行。 分层模块、多重继承和本地类给其添加了很多功能。 8. Nutika 这是一种源码到源码的 Python 编译器,可以将 Python 源代码转换为 C/C++ 可执行代码。 它会使用到许多 Python 库和扩展模块。 它自带 Anaconda,可用于创建数据科学和机器学习项目。 9. Jython 它用 Java 编写,可以在运行 JVM 的任何平台上执行。 Jython 将 Python代码编译为 Java 字节码,从而做到跨平台。 它可用于创建 Servelets、Swing、SWT 和 AWT 软件包的解决方案。 Jython 使用 CPython 之类的全局解释器锁(GIL) 。 另外,你可以将 Java 类扩展到 Python 代码。 https://www.jython.org 10. CPython CPython 是默认的且使用最广泛的 Python 编译器。 它是用 C 语言编写的,并使用 GIL(全局解释器锁),这使得并发 CPython 进程之间的通信很困难。 CPython 中的编译步骤包括: 解码、令牌化、解析、抽象语法树和编译。 https://compilers.pydata.org 11. IronPython 此版本的 Python 编译器是在微软的 .Net 框架和 Mono 上实现的。 它还提供了动态编译和交互式控制台。 它使得安装非常容易,并且具有跨平台兼容性。 它还具有标准库和不同的模块,主要用于实现 .Net 框架的用户界面库。 https://ironpython.net 结论 Python 是一种为许多实现提供了可能的开发语言,例如 Python 到 Java,Python 到 Javascript 或其它。 Top 7 Free Python Compilers and Interpreters interpreter CPython, IronPython, ActivePython, Stackless Python compiler Nuitka, Brython, PyJS, Shed Skin, Skulpt, Transcrypt, WinPython PyJS translates your Python code into JavaScript to let it run in a browser client-side web and desktop applications Execute PYTHON Online PYTHON Online PYTHON Online compiler

Python 小贴士和技巧

元旦过完了,我们都纷纷回到了各自的工作岗位。 新的一年新气象,我想借本文为大家献上 Python 语言的30个最佳实践、小贴士和技巧,希望能对各位勤劳的程序员有所帮助,并希望大家工作顺利! 1. Python 版本 在此想提醒各位:自2020年1月1日起,Python 官方不再支持 Python 2。 本文中的很多示例只能在 Python 3 中运行。 如果你仍在使用 Python 2.7,请立即升级。 2. 检查 Python 的最低版本 你可以在代码中检查 Python 的版本,以确保你的用户没有在不兼容的版本中运行脚本。 检查方式如下: if not sys.version_info > (2, 7): # berate your user for running a 10 year # python version elif not sys.version_info >= (3, 5): # Kindly tell your user (s)he needs to upgrade # because you're using 3.5 features 3. IPython IPython 本质上就是一个增强版的shell。 就冲着自动补齐就值得一试,而且它的功能还不止于此,它还有很多令我爱不释手的命令,例如: %cd:改变当前的工作目录 %edit:打开编辑器,并关闭编辑器后执行键入的代码 %env:显示当前环境变量 %pip install [pkgs]:无需离开交互式shell,就可以安装软件包 %time 和 %timeit:测量执行Python代码的时间 完整的命令列表,请点击此处查看(https://ipython.readthedocs.io/en/stable/interactive/magics.html)。 还有一个非常实用的功能:引用上一个命令的输出。 In 和 Out 是实际的对象。 你可以通过 Out[3] 的形式使用第三个命令的输出。 IPython 的安装命令如下: pip3 install ipython 4. 列表推导式 你可以利用列表推导式,避免使用循环填充列表时的繁琐。 列表推导式的基本语法如下: [ expression for item in list if conditional ] 举一个基本的例子:用一组有序数字填充一个列表: mylist = [i for i in range(10)] print(mylist) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 由于可以使用表达式,所以你也可以做一些算术运算: squares = [x**2 for x in range(10)] print(squares) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] 甚至可以调用外部函数: def some_function(a): return (a + 5) / 2 my_formula = [some_function(i) for i in range(10)] print(my_formula) # [2, 3, 3, 4, 4, 5, 5, 6, 6, 7] 最后,你还可以使用 ‘if’ 来过滤列表。 在如下示例中,我们只保留能被2整除的数字: filtered = [i for i in range(20) if i%2==0] print(filtered) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] 5. 检查对象使用内存的状况 你可以利用 sys.getsizeof() 来检查对象使用内存的状况: import sys mylist = range(0, 10000) print(sys.getsizeof(mylist)) # 48 等等,为什么这个巨大的列表仅包含48个字节? 因为这里的 range 函数返回了一个类,只不过它的行为就像一个列表。 在使用内存方面,range 远比实际的数字列表更加高效。 你可以试试看使用列表推导式创建一个范围相同的数字列表: import sys myreallist = [x for x in range(0, 10000)] print(sys.getsizeof(myreallist)) # 87632 6. 返回多个值 Python 中的函数可以返回一个以上的变量,而且还无需使用字典、列表或类。 如下所示: def get_user(id): # fetch user from database # .... return name, birthdate name, birthdate = get_user(4) 如果返回值的数量有限当然没问题。 但是,如果返回值的数量超过3个,那么你就应该将返回值放入一个(数据)类中。 7. 使用数据类 Python从版本3.7开始提供数据类。 与常规类或其他方法(比如返回多个值或字典)相比,数据类有几个明显的优势: 数据类的代码量较少 你可以比较数据类,因为数据类提供了 __eq__ 方法 调试的时候,你可以轻松地输出数据类,因为数据类还提供了 __repr__ 方法 数据类需要类型提示,因此可以减少Bug的发生几率 数据类的示例如下: from dataclasses import dataclass @dataclass class Card: rank: str suit: str card = Card("Q", "hearts") print(card == card) # True print(card.rank) # 'Q' print(card) Card(rank='Q', suit='hearts') 详细的使用指南请点击这里(https://realpython.com/python-data-classes/)。 8. 交换变量 如下的小技巧很巧妙,可以为你节省多行代码: a = 1 b = 2 a, b = b, a print((a)) # 2 print((b)) # 1 9. 合并字典(Python 3.5以上的版本) 从Python 3.5开始,合并字典的操作更加简单了: dict1 = { 'a': 1, 'b': 2 } dict2 = { 'b': 3, 'c': 4 } merged = { **dict1, **dict2 } print((merged)) # {'a': 1, 'b': 3, 'c': 4} 如果 key 重复,那么第一个字典中的 key 会被覆盖。 10. 字符串的首字母大写 如下技巧真是一个小可爱: mystring = "10 awesome python tricks" print(mystring.title()) '10 Awesome Python Tricks' 11. 将字符串分割成列表 你可以将字符串分割成一个字符串列表。 在如下示例中,我们利用空格分割各个单词: mystring = "The quick brown fox" mylist = mystring.split(' ') print(mylist) # ['The', 'quick', 'brown', 'fox'] 12. 根据字符串列表创建字符串 与上述技巧相反,我们可以根据字符串列表创建字符串,然后在各个单词之间加入空格: mylist = ['The', 'quick', 'brown', 'fox'] mystring = " ".join(mylist) print(mystring) # 'The quick brown fox' 你可能会问为什么不是 mylist.join(" "),这是个好问题! 根本原因在于,函数 String.join() 不仅可以联接列表,而且还可以联接任何可迭代对象。 将其放在String中是为了避免在多个地方重复实现同一个功能。 13. 表情符 有些人非常喜欢表情符,而有些人则深恶痛绝。 我在此郑重声明:在分析社交媒体数据时,表情符可以派上大用场。 首先,我们来安装表情符模块: pip3 install emoji 安装完成后,你可以按照如下方式使用: import emoji result = emoji.emojize('Python is :thumbs_up:') print(result) # 'Python is 👍' # You can also reverse this: result = emoji.demojize('Python is 👍') print(result) # 'Python is :thumbs_up:' 更多有关表情符的示例和文档,请点击此处(https://pypi.org/project/emoji/)。 14. 列表切片 列表切片的基本语法如下: a[start:stop:step] start、stop 和 step 都是可选项。 如果不指定,则会使用如下默认值: start:0 end:字符串的结尾 step:1 示例如下: # We can easily create a new list from # the first two elements of a list: first_two = [1, 2, 3, 4, 5][0:2] print(first_two) # [1, 2] # And if we use a step value of 2, # we can skip over every second number # like this: steps = [1, 2, 3, 4, 5][0:5:2] print(steps) # [1, 3, 5] # This works on strings too. In Python, # you can treat a string like a list of # letters: mystring = "abcdefdn nimt"[::2] print(mystring) # 'aced it' 15. 反转字符串和列表 你可以利用如上切片的方法来反转字符串或列表。 只需指定 step 为 -1,就可以反转其中的元素: revstring = "abcdefg"[::-1] print(revstring) # 'gfedcba' revarray = [1, 2, 3, 4, 5][::-1] print(revarray) # [5, 4, 3, 2, 1] 16. 显示猫猫 我终于找到了一个充分的借口可以在我的文章中显示猫猫了,哈哈!当然,你也可以利用它来显示图片。 首先你需要安装 Pillow,这是一个 Python 图片库的分支: pip3 install Pillow 接下来,你可以将如下图片下载到一个名叫 kittens.jpg 的文件中: 然后,你就可以通过如下 Python 代码显示上面的图片: from PIL import Image im = Image.open("kittens.jpg") im.show() print(im.format, im.size, im.mode) # JPEG (1920, 1357) RGB Pillow 还有很多显示该图片之外的功能。 它可以分析、调整大小、过滤、增强、变形等等。 完整的文档,请点击这里(https://pillow.readthedocs.io/en/stable/)。 17. map() Python 有一个自带的函数叫做 map(),语法如下: map(function, something_iterable) 所以,你需要指定一个函数来执行,或者一些东西来执行。 任何可迭代对象都可以。 在如下示例中,我指定了一个列表: def upper(s): return s.upper() mylist = list(map(upper, ['sentence', 'fragment'])) print(mylist) # ['SENTENCE', 'FRAGMENT'] # Convert a string representation of # a number into a list of ints. list_of_ints = list(map(int, "1234567"))) print(list_of_ints) # [1, 2, 3, 4, 5, 6, 7] 你可以仔细看看自己的代码,看看能不能用 map() 替代某处的循环。 18. 获取列表或字符串中的唯一元素 如果你利用函数 set() 创建一个集合,就可以获取某个列表或类似于列表的对象的唯一元素: mylist = [1, 1, 2, 3, 4, 5, 5, 5, 6, 6] print((set(mylist))) # {1, 2, 3, 4, 5, 6} # And since a string can be treated like a # list of letters, you can also get the # unique letters from a string this way: print((set("aaabbbcccdddeeefff"))) # {'a', 'b', 'c', 'd', 'e', 'f'} 19. 查找出现频率最高的值
你可以通过如下方法查找出现频率最高的值: test = [1, 2, 3, 4, 2, 2, 3, 1, 4, 4, 4] print(max(set(test), key = test.count)) # 4 你能看懂上述代码吗?想法搞明白上述代码再往下读。 没看懂?我来告诉你吧: max() 会返回列表的最大值。 参数 key 会接受一个参数函数来自定义排序,在本例中为 test.count。 该函数会应用于迭代对象的每一项。 test.count 是 list 的内置函数。 它接受一个参数,而且还会计算该参数的出现次数。 因此,test.count(1) 将返回2,而 test.count(4) 将返回4。 set(test) 将返回 test 中所有的唯一值,也就是 {1, 2, 3, 4}。 因此,这一行代码完成的操作是:首先获取 test 所有的唯一值,即{1, 2, 3, 4};然后,max 会针对每一个值执行 list.count,并返回最大值。 这一行代码可不是我个人的发明。 20. 创建一个进度条 你可以创建自己的进度条,听起来很有意思。 但是,更简单的方法是使用 progress 包: pip3 install progress 接下来,你就可以轻松地创建进度条了: from progress.bar import Bar bar = Bar('Processing', max=20) for i in range(20): # Do some work bar.next() bar.finish() 21. 在交互式shell中使用_(下划线运算符) 你可以通过下划线运算符获取上一个表达式的结果,例如在 IPython 中,你可以这样操作: In [1]: 3 * 3 Out[1]: 9In [2]: _ + 3 Out[2]: 12 Python Shell 中也可以这样使用。 另外,在 IPython shell 中,你还可以通过 Out[n] 获取表达式 In[n] 的值。 例如,在如上示例中,Out[1] 将返回数字9。 22. 快速创建Web服务器 你可以快速启动一个Web服务,并提供当前目录的内容: python3 -m http.server 当你想与同事共享某个文件,或测试某个简单的HTML网站时,就可以考虑这个方法。 23. 多行字符串 虽然你可以用三重引号将代码中的多行字符串括起来,但是这种做法并不理想。 所有放在三重引号之间的内容都会成为字符串,包括代码的格式,如下所示。 我更喜欢另一种方法,这种方法不仅可以将多行字符串连接在一起,而且还可以保证代码的整洁。 唯一的缺点是你需要明确指定换行符。 s1 = """Multi line strings can be put between triple quotes. It's not ideal when formatting your code though""" print((s1)) # Multi line strings can be put # between triple quotes. It's not ideal # when formatting your code though s2 = ("You can also concatenate multiple\n" + "strings this way, but you'll have to\n" "explicitly put in the newlines") print(s2) # You can also concatenate multiple # strings this way, but you'll have to # explicitly put in the newlines 24. 条件赋值中的三元运算符 这种方法可以让代码更简洁,同时又可以保证代码的可读性: [on_true] if [expression] else [on_false] 示例如下: x = "Success!" if (y == 2) else "Failed!" 25. 统计元素的出现次数 你可以使用集合库中的 Counter 来获取列表中所有唯一元素的出现次数,Counter 会返回一个字典: from collections import Counter mylist = [1, 1, 2, 3, 4, 5, 5, 5, 6, 6] c = Counter(mylist) print(c) # Counter({1: 2, 2: 1, 3: 1, 4: 1, 5: 3, 6: 2}) # And it works on strings too: print(Counter("aaaaabbbbbccccc")) # Counter({'a': 5, 'b': 5, 'c': 5}) 26. 比较运算符的链接 你可以在 Python 中将多个比较运算符链接到一起,如此就可以创建更易读、更简洁的代码: x = 10 # Instead of: if x > 5 and x < 15: print("Yes") # yes # You can also write: if 5 < x < 15: print("Yes") # Yes 27. 添加颜色 你可以通过 Colorama,设置终端的显示颜色: from colorama import Fore, Back, Style print(Fore.RED + 'some red text') print(Back.GREEN + 'and with a green background') print(Style.DIM + 'and in dim text') print(Style.RESET_ALL) print('back to normal now') 28. 日期的处理 python-dateutil 模块作为标准日期模块的补充,提供了非常强大的扩展,你可以通过如下命令安装: pip3 install python-dateutil 你可以利用该库完成很多神奇的操作。 在此我只举一个例子:模糊分析日志文件中的日期: from dateutil.parser import parse logline = 'INFO 2020-01-01T00:00:01 Happy new year, human.' timestamp = parse(log_line, fuzzy=True) print(timestamp) # 2020-01-01 00:00:01 你只需记住:当遇到常规 Python 日期时间功能无法解决的问题时,就可以考虑 python-dateutil ! 29.整数除法 在 Python 2 中,除法运算符(/)默认为整数除法,除非其中一个操作数是浮点数。 因此,你可以这么写: # Python 2 5 / 2 = 2 5 / 2.0 = 2.5 在 Python 3 中,除法运算符(/)默认为浮点除法,而整数除法的运算符为 //。 因此,你需要这么写: Python 3 5 / 2 = 2.5 5 // 2 = 2 这项变更背后的动机,请参阅 PEP-0238(https://www.python.org/dev/peps/pep-0238/)。 30. 通过chardet 来检测字符集 你可以使用 chardet 模块来检测文件的字符集。 在分析大量随机文本时,这个模块十分实用。 安装方法如下: pip install chardet 安装完成后,你就可以使用命令行工具 chardetect 了,使用方法如下: chardetect somefile.txt somefile.txt: ascii with confidence 1.0 你也可以在编程中使用该库,完整的文档请点击这里(https://chardet.readthedocs.io/en/latest/usage.html)。

Python Creating a Menu

def menu : print("welcome, \n Option 1\n Option 2\n Option 3\n") choice = input() if choice == "1"; print("Option 1") menu() if choice == "2"; print("Option 2") menu() if choice == "3"; print("Option 3") menu() menu()

Python Lambda Functions

Python Lambda Functions Lambdas, also known as anonymous functions, are small, restricted functions which do not need a name (i.e., an identifier). Today, many modern programming languages like Java, Python, C#, and C++ support lambda functions to add functionality to the languages. Syntax and Examples lambda arguments : expression lambda p1, p2: expression x = lambda a : a + 10 print(x(5)) 15 adder = lambda x, y: x + y print((adder (1, 2))) 3 #A REGULAR FUNCTION def guru( funct, *args ): funct( *args ) def printer_one( arg ): return print((arg)) def printer_two( arg ): print(arg) #CALL A REGULAR FUNCTION guru( printer_one, 'printer 1 REGULAR CALL' ) guru( printer_two, 'printer 2 REGULAR CALL \n' ) #CALL A REGULAR FUNCTION THRU A LAMBDA guru(lambda: printer_one('printer 1 LAMBDA CALL')) guru(lambda: printer_two('printer 2 LAMBDA CALL'))

mysql

import mysql.connector db = mysql.connector.connect( host="localhost", user="root", passwd="asdf1234", database="demo" ) mycursor = db.cursor() #mycursor.execute("CREATE TABLE urlTable (titleName varchar(50), urlAddr varchar(100), id int PRIMARY KEY )") mycursor.execute("INSERT INTO urlTable (titleName, urlAddr) VALUES (%s,%s), ('google', 'google.com')") db.commit()

subprocess module

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several older modules and functions: os.system os.spawn* The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly. subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, capture_output=False, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None) Examples: >>> subprocess.run(["ls", "-l"]) # doesn't capture output >>> subprocess.run("exit 1", shell=True, check=True) >>> subprocess.run(["ls", "-l", "/dev/null"], capture_output=True) CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0, stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n', stderr=b'') Popen Constructor Execute a child program in a new process. example of passing some arguments to an external program as a sequence: Popen(["/usr/bin/git", "commit", "-m", "Fixes a bug."]) example to break a shell command into a sequence of arguments. shlex.split() can illustrate how to determine the correct tokenization for args: >>> import shlex, subprocess >>> command_line = input() /bin/vikings -input eggs.txt -output "spam spam.txt" -cmd "echo '$MONEY'" >>> args = shlex.split(command_line) >>> print(args) ['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"] >>> p = subprocess.Popen(args) # Success!

SVG drawings

python svgwrite A Python library to create SVG drawings. Python modules for Inkscape extensions svgwrite A Python library to create SVG drawings. a simple example: import svgwrite dwg = svgwrite.Drawing('test.svg', profile='tiny') dwg.add(dwg.line((0, 0), (10, 0), stroke=svgwrite.rgb(10, 10, 16, '%'))) dwg.add(dwg.text('Test', insert=(0, 0.2), fill='red')) dwg.save() As the name svgwrite implies, svgwrite creates new SVG drawings, it does not read existing drawings and also does not import existing drawings, but you can always include other SVG drawings by the <image> entity. Installation with pip: pip install svgwrite or from source: python setup.py install Documentation http://readthedocs.org/docs/svgwrite/ svgwrite can be found on GitHub.com at: http://github.com/mozman/svgwrite.git Inkscape extensions by non developers Inkscape extensions

Python Extract Local Storage

http://scraping.pro/extract-browsers-local-storage-with-python/

Python and Selenium

To access the browser's local storage when scraping a page, we need to invoke both a browser instance and leverage a JavaScript interpreter to read the local storage. For my money, Selenium is the best solution. A possible replacement for Selenium is PhantomJS, running a headless browser.

JaveScript to iterate over localStorage browser object

for (var i = 0; i &lt; localStorage.length; i++){ key=localStorage.key(i); console.log(key+': '+localStorage.getItem(key)); }

Advanced script

As mentioned here a HTML5 featured browser should also implement Array.prototype.map. So script would be: Array.apply(0, new Array(localStorage.length)).map(function (o, i) { return localStorage.key(i)+':'+localStorage.getItem(localStorage.key(i)); } )

Python with Selenium script for setting up and scraping local storage

from selenium import webdriver driver = webdriver.Firefox() url='http://www.w3schools.com/' driver.get(url) scriptArray="""localStorage.setItem("key1", 'new item'); localStorage.setItem("key2", 'second item'); return Array.apply(0, new Array(localStorage.length)).map(function (o, i) { return localStorage.getItem(localStorage.key(i)); } )""" result = driver.execute_script(scriptArray) print(result)

Python bindings alternative to Python+Selenium

Some might argue Selenium is inefficient for only local storage extracting. If you think Selenium is too bulky, you might want to try a Python binding with a development framework for desktop, ex. PyQt. Something I might touch on in a later post.

List of Python Modules

Web: Requests: https://pypi.org/project/requests/ Django: https://pypi.org/project/Django/ Flask: https://pypi.org/project/Flask/ Twisted: https://twistedmatrix.com/trac/ BeautifulSoup: https://pypi.org/project/beautifulsoup4/ Selenium: https://selenium-python.readthedocs.io/ Data science: Numpy: https://numpy.org/ Pandas: https://pandas.pydata.org/ Matplotlib: https://matplotlib.org/ Nltk: https://www.nltk.org/ Opencv: https://opencv-python-tutroals.readth... Machine Learning: Tensorflow: https://www.tensorflow.org/ Keras: https://keras.io/ PyTorch: https://pytorch.org/ Sci-kit Learn: https://scikit-learn.org/stable/ GUI: Kivy: https://kivy.org/#home PyQt5: https://pypi.org/project/PyQt5/ Tkinter: https://wiki.python.org/moin/TkInter Bonus: Pygame: https://www.pygame.org/docs/

Running Python in the Browser

Running Python in the web browser has been getting a lot of attention lately. Shaun Taylor-Morgan knows what he's talking about here – he works for Anvil, a full-featured application platform for writing full-stack web apps with nothing but Python. So I invited him to give us an overview and comparison of the open-source solutions for running Python code in your web browser. In the past, if you wanted to build a web UI, your only choice was JavaScript. That's no longer true. There are quite a few ways to run Python in your web browser. This is a survey of what's available. I'm looking at six systems that all take a different approach to the problem. Here's a diagram that sums up their differences. The x-axis answers the question: when does Python get compiled? At one extreme, you run a command-line script to compile Python yourself. At the other extreme, the compilation gets done in the user's browser as they write Python code. The y-axis answers the question: what does Python get compiled to? Three systems make a direct conversion between the Python you write and some equivalent JavaScript. The other three actually run a live Python interpreter in your browser, each in a slightly different way.

1. TRANSCRYPT

Transcrypt gives you a command-line tool you can run to compile a Python script into a JavaScript file. You interact with the page structure (the DOM) using a toolbox of specialized Python objects and functions. For example, if you import document, you can find any object on the page by using document like a dictionary. To get the element whose ID is name-box, you would use document["name-box"]. Any readers familiar with JQuery will be feeling very at home. Here's a basic example. I wrote a Hello, World page with just an input box and a button: <input id="name-box" placeholder="Enter your name"> <button id="greet-button">Say Hello</button> To make it do something, I wrote some Python. When you click the button, an event handler fires that displays an alert with a greeting: def greet(): alert("Hello " + document.getElementById("name-box").value + "!") document.getElementById("greet-button").addEventListener('click', greet) I wrote this in a file called hello.py and compiled it using transcrypt hello.py. The compiler spat out a JavaScript version of my file, called hello.js. Transcrypt makes the conversion to JavaScript at the earliest possible time – before the browser is even running. Next we'll look at Brython, which makes the conversion on page load.

2. BRYTHON

Brython lets you write Python in script tags in exactly the same way you write JavaScript. Just as with Transcrypt, it has a document object for interacting with the DOM. The same widget I wrote above can be written in a script tag like this: <script type="text/python"> from browser import document, alert def greet(event): alert("Hello " + document["name-box"].value + "!") document["greet-button"].bind("click", greet) </script> Pretty cool, huh? A script tag whose type is text/python! There's a good explanation of how it works on the Brython GitHub page. In short, you run a function when your page loads: <body onload="brython()"> that transpiles anything it finds in a Python script tag: <script type="text/python"></script> which results in some machine-generated JavaScript that it runs using JS's eval() function.

3. SKULPT

Skulpt sits at the far end of our diagram – it compiles Python to JavaScript at runtime. This means the Python doesn't have to be written until after the page has loaded. The Skulpt website has a Python REPL that runs in your browser. It's not making requests back to a Python interpreter on a server somewhere, it's actually running on your machine. Skulpt does not have a built-in way to interact with the DOM. This can be an advantage, because you can build your own DOM manipulation system depending on what you're trying to achieve. More on this later. Skulpt was originally created to produce educational tools that need a live Python session on a web page (example: Trinket.io). While Transcrypt and Brython are designed as direct replacements for JavaScript, Skulpt is more suited to building Python programming environments on the web (such as the full-stack app platform, Anvil). We've reached the end of the x-axis in our diagram. Next we head in the vertical direction: our final three technologies don't compile Python to JavaScript, they actually implement a Python runtime in the web browser.

4. PYPY.JS

PyPy.js is a JavaScript implementation of a Python interpreter. The developers took a C-to-JavaScript compiler called emscripten and ran it on the source code of PyPy. The result is PyPy, but running in your browser. Advantages: It's a very faithful implementation of Python, and code gets executed quickly. Disadvantages: A web page that embeds PyPy.js contains an entire Python interpreter, so it's pretty big as web pages go (think megabytes). You import the interpreter using <script> tags, and you get an object called pypyjs in the global JS scope. There are three main functions for interacting with the interpreter. To execute some Python, run pypyjs.exec(<python code>). To pass values between JavaScript and Python, use pypyjs.set(variable, value) and pypyjs.get(variable). Here's a script that uses PyPy.js to calculate the first ten square numbers: <script type="text/javascript"> pypyjs.exec( // Run some Python 'y = [x**2. for x in range(10)]' ).then(function() { // Transfer the value of y from Python to JavaScript pypyjs.get('y') }).then(function(result) { // Display an alert box with the value of y in it alert(result) }); </script> PyPy.js has a few features that make it feel like a native Python environment – there's even an in-memory filesystem so you can read and write files. There's also a document object that gives you access to the DOM from Python. The project has a great readme if you're interested in learning more.

5. BATAVIA

Batavia is a bit like PyPy.js, but it runs bytecode rather than Python. Here's a Hello, World script written in Batavia: <script id="batavia-helloworld" type="application/python-bytecode"> 7gwNCkIUE1cWAAAA4wAAAAAAAAAAAAAAAAIAAABAAAAAcw4AAABlAABkAACDAQABZAEAUykCegtI ZWxsbyBXb3JsZE4pAdoFcHJpbnSpAHICAAAAcgIAAAD6PC92YXIvZm9sZGVycy85cC9uenY0MGxf OTc0ZGRocDFoZnJjY2JwdzgwMDAwZ24vVC90bXB4amMzZXJyddoIPG1vZHVsZT4BAAAAcwAAAAA= </script> Bytecode is the ‘assembly language' of the Python virtual machine – if you've ever looked at the .pyc files Python generates, that's what they contain (Yasoob dug into some bytecode in a recent post on this blog). This example doesn't look like assembly language because it's base64-encoded. Batavia is potentially faster than PyPy.js, since it doesn't have to compile your Python to bytecode. It also makes the download smaller – around 400kB. The disadvantage is that your code needs to be written and compiled in a native (non-browser) environment, as was the case with Transcrypt. Again, Batavia lets you manipulate the DOM using a Python module it provides (in this case it's called dom). The Batavia project is quite promising because it fills an otherwise unfilled niche – ahead-of-time compiled Python in the browser that runs in a full Python VM. Unfortunately, the GitHub repo's commit rate seems to have slowed in the past year or so. If you're interested in helping out, here's their developer guide.

6. PYODIDE

Mozilla's Pyodide was announced in April 2019. It solves a difficult problem: interactive data visualisation in Python, in the browser. Python has become a favourite language for data science thanks to libraries such as NumPySciPyMatplotlib and Pandas. We already have Jupyter Notebooks, which are a great way to present a data pipeline online, but they must be hosted on a server somewhere. If you can put the data processing on the user's machine, they avoid the round-trip to your server so real-time visualisation is more powerful. And you can scale to so many more users if their own machines are providing the compute. It's easier said than done. Fortunately, the Mozilla team came across a version of the reference Python implementation (CPython) that was compiled into WebAssembly. WebAssembly is a low-level compliment to JavaScript that performs closer to native speeds, which opens the browser up for performance-critical applications like this. Mozilla took charge of the WebAssembly CPython project and recompiled NumPy, SciPy, Matplotlib and Pandas into WebAssembly too. The result is a lot like Jupyter Notebooks in the browser – here's an introductory notebook. It's an even bigger download than PyPy.js (that example is around 50MB), but as Mozilla point out, a good browser will cache that for you. And for a data processing notebook, waiting a few seconds for the page to load is not a problem. You can write HTML, MarkDown and JavaScript in Pyodide Notebooks too. And yes, there's a document object to access the DOM. It's a really promising project!

MAKING A CHOICE

I've given you six different ways to write Python in the browser, and you might be able to find more. Which one to choose? This summary table may help you decide.
There's a more general point here too: the fact that there is a choice. As a web developer, it often feels like you have to write JavaScript, you have to build an HTTP API, you have to write SQL and HTML and CSS. The six systems we've looked at make JavaScript seem more like a language that gets compiled to, and you choose what to compile to it (And WebAssembly is actually designed to be used this way). Why not treat the whole web stack this way? The future of web development is to move beyond the technologies that we've always ‘had' to use. The future is to build abstractions on top of those technologies, to reduce the unnecessary complexity and optimise developer efficiency. That's why Python itself is so popular – it's a language that puts developer efficiency first.

ONE UNIFIED SYSTEM

There should be one way to represent data, from the database all the way to the UI. Since we're Pythonistas, we'd like everything to be a Python object, not an SQL SELECT statement followed by a Python object followed by JSON followed by a JavaScript object followed by a DOM element. That's what Anvil does – it's a full-stack Python environment that abstracts away the complexity of the web. Here's a 7-minute video that covers how it works. Remember I said that it can be an advantage that Skulpt doesn't have a built-in way to interact with the DOM? This is why. If you want to go beyond ‘Python in the browser' and build a fully-integrated Python environment, your abstraction of the User Interface needs to fit in with your overall abstraction of the web system. So Python in the browser is just the start of something bigger. I like to live dangerously, so I'm going to make a prediction. In 5 years' time, more than 50% of web apps will be built with tools that sit one abstraction level higher than JavaScript frameworks such as React and Angular. It has already happened for static sites: most people who want a static site will use WordPress or Wix rather than firing up a text editor and writing HTML. As systems mature, they become unified and the amount of incidental complexity gradually minimises.

Brython tutorial

This tutorial explains how to develop an application that runs in the browser using the Python programming language. We will take the example of writing a calculator. You will need a text editor, and of course a browser with an Internet access. The contents of this tutorial assumes that you have at least a basic knowledge of HTML (general page structure, most usual tags), of stylesheets (CSS) and of the Python language. In the text editor, create an HTML page with the following content: <!doctype html> <html> <head> <meta charset="utf-8"> <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/brython@3.8.9/brython.min.js"> </script> </head> <body onload="brython()"> <script type="text/python"> from browser import document document <= "Hello !" </script> </body> </html> In an empty directory, save this page as index.html. To read it in the browser, you have two options: use the File/Open menu: it is the most simple solution. It brings some limitations for an advanced use, but it works perfectly for this tutorial launch a web server : for instance, if the Python interpreter available from python.org is available on your machine, run python -m http.server in the file directory, then enter localhost:8000/index.html in the browser address bar When you open the page, you should see the message "Hello !" printed on the browser window.

Page structure

Let's take a look at the page contents. In the <head> zone we load the script brython.js : it is the Brython engine, the program that will find and execute the Python scripts included in the page. In this example we get it from a CDN, so that there is nothing to install on the PC. Note the version number (brython@3.8.9) : it can be updated for each new Brython version. The <body> tag has an attribute onload="brython()". It means that when the page has finished loading, the browser has to call the function brython(), which is defined in the Brython engine loaded in the page. The function searches all the <script>tags that have the attribute type="text/python" and executes them. Our index.html page embeds this script: from browser import document document <= "Hello !" This is a standard Python program, starting by the import of a module, browser (in this case, a module shipped with the Brython engine brython.js). The module has an attribute document which references the content displayed in the browser window. To add a text to the document - concretely, to display a text in the browser - the syntax used by Brython is document <= "Hello !" You can think of the <= sign as a left arrow : the document "receives" a new element, here the string "Hello !". You will see later that it is always possible to use the standardized DOM syntax to interact with the page, by Brython provides a few shortcuts to make the code less verbose.

Text formatting with HTML tags

HTML tags allow text formatting, for instance to write it in bold letters (<B> tag), in italic (<I>), etc. With Brython, these tags are available as functions defined in module html of the browser package. Here is how to use it: from browser import document, html document <= html.B("Hello !") Tags can be nested: document <= html.B(html.I("Hello !")) Tags can also be added to each other, as well as strings: document <= html.B("Hello, ") + "world !" The first argument of a tag function can be a string, a number, another tag. It can also be a Python "iterable" (list, comprehension, generator): in this case, all the elements produced in the iteration are added to the tag: document <= html.UL(html.LI(i) for i in range(5)) Tag attributes are passed as keyword arguments to the function: html.A("Brython", href="http://brython.info")

Drawing the calculator

We can draw our calculator as an HTML table. The first line is made of the result zone, followed by a reset button. The next 3 lines are the calculator touches, digits and operations. from browser import document, html calc = html.TABLE() calc <= html.TR(html.TH(html.DIV("0", id="result"), colspan=3) + html.TH("C", id="clear")) lines = ["789/", "456*", "123-", "0.=+"] calc <= (html.TR(html.TD(x) for x in line) for line in lines) document <= calc Note the use of Python generators to reduce the program size, while keeping it readable. Let's add style to the <TD> tags in a stylesheet so that the calculator looks better: <style> *{ font-family: sans-serif; font-weight: normal; font-size: 1.1em; } td{ background-color: #ccc; padding: 10px 30px 10px 30px; border-radius: 0.2em; text-align: center; cursor: default; } #result{ border-color: #000; border-width: 1px; border-style: solid; padding: 10px 30px 10px 30px; text-align: right; } </style>

Event handling

The next step is to trigger an action when the user presses the calculator touches: for digits and operations : print(the digit or operation in the result )zone for the = sign : execute the operation and print(the result, or an error )message if the input is invalid for the C letter : reset the result zone To handle the elements printed in the page, the program need first to get a reference to them. The buttons have been created as <TD> tags; to get a reference to all these tags, the syntax is document.select("td") The result of select() is always a list of elements. The events that can occur on the elements of a page have a normalized name: when the user clicks on a button, the event called "click" is triggered. In the program, this event will provoque the execution of a function. The association betweeen element, event and function is defined by the syntax element.bind("click", action) For the calculator, we can associate the same function to the "click" event on all buttons by: for button in document.select("td"): button.bind("click", action) To be compliant to Python syntax, the function action() must have been defined somewhere before in the program. Such "callback" functions take a single parameter, an object that represents the event.

Complete program

Here is the code that manages a minimal version of the calculator. The most important part is in the function action(event). from browser import document, html # Construction de la calculatrice calc = html.TABLE() calc <= html.TR(html.TH(html.DIV("0", id="result"), colspan=3) + html.TD("C")) lines = ["789/", "456*", "123-", "0.=+"] calc <= (html.TR(html.TD(x) for x in line) for line in lines) document <= calc result = document["result"] # direct acces to an element by its id def action(event): """Handles the "click" event on a button of the calculator.""" # The element the user clicked on is the attribute "target" of the # event object element = event.target # The text printed on the button is the element's "text" attribute value = element.text if value not in "=C": # update the result zone if result.text in ["0", "error"]: result.text = value else: result.text = result.text + value elif value == "C": # reset result.text = "0" elif value == "=": # execute the formula in result zone try: result.text = eval(result.text) except: result.text = "error" # Associate function action() to the event "click" on all buttons for button in document.select("td"): button.bind("click", action)

Python in the browser with Brython

Python In The Browser <script src="https://cdnjs.cloudflare.com/ajax/libs/brython/3.8.8/brython.js" integrity="sha256-rA89wPrTJJQFWJaZveKW8jpdmC3t5F9rRkPyBjz8G04=" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/brython/3.8.8/brython_stdlib.js" integrity="sha256-Gnrw9tIjrsXcZSCh/wos5Jrpn0bNVNFJuNJI9d71TDs=" crossorigin="anonymous"></script> <body onload="brython()"> <h1>Brython Crash Course</h1> <h2 id="hello"></h2> <button id="alert-btn">Alert & Insert</button> <input type="text" id="text" placeholder="Enter something"> <span id="output"></span> <h2 id="greet">Hello {name}</h2> <button id="joke-btn">Get Joke</button> <div id="joke" class="card">Click the "get joke" button</div> <input type="file" id="file-upload"> <br> <textarea id="file-text" cols="60" rows="10"></textarea> <div class="card"> <button id="rotate-btn">Rotate</button> <div id="rotate-box" class="box"></div> </div> <h2>Saved Item: <span id="item"></span></h2> <input type="text" id="item-input" placeholder="Add to local storage"> <button id="add-btn" style="display: inline;">Add</button> <button id="remove-btn" style="display: inline;">Remove</button> <!-- Alert & DOM insert --> <script type="text/python" id="script0"> from browser import document, console, alert def show(e): console.log(e) alert('Hello World') document['hello'] <= 'Hello World' document['alert-btn'].bind('click', show) </script> <!-- Text bind --> <script type="text/python" id="script1"> from browser import document def show_text(e): document['output'].textContent = e.target.value; document['text'].bind('input', show_text) </script> <!-- Template and variable --> <script type="text/python" id="script2"> from browser import document from browser.template import Template Template(document['greet']).render(name='Brad') </script> <!-- Ajax call --> <script type="text/python" id="script3"> from browser import document, ajax url = 'https://api.chucknorris.io/jokes/random' def on_complete(req): import json data = json.loads(req.responseText) joke = data['value'] document['joke'].text = joke def get_joke(e): req = ajax.ajax() req.open('GET', url, True) req.bind('complete', on_complete) document['joke'].text = 'Loading...' req.send() document['joke-btn'].bind('click', get_joke) </script> <!-- Load file data --> <script type="text/python" id="script4"> from browser import document, window def file_read(e): def onload(e): document['file-text'].value = e.target.result file = document['file-upload'].files[0] reader = window.FileReader.new() reader.readAsText(file) reader.bind('load', onload) document['file-upload'].bind('input', file_read) </script> <!-- Rotate - manipulate style --> <script type="text/python" id="script5"> from browser import document, html box = document['rotate-box'] angle = 10 def change(e): global angle box.style.transform = f"rotate({angle}deg)" angle += 10 document['rotate-btn'].bind('click', change) </script> <!-- Local storage --> <script type="text/python" id="script6"> from browser import document, html, window, console storage = window.localStorage if storage.getItem('item'): document['item'] <= storage.getItem('item') def add_item(e): item = document['item-input'].value storage.setItem('item', item) document['item'].textContent = item def remove_item(e): storage.removeItem('item') document['item'].textContent = '' document['add-btn'].bind('click', add_item) document['remove-btn'].bind('click', remove_item) </script> </body> </html>

Python Selenium

selenium with Tim Python Selenium Tutorial #1 - Web Scraping, Bots & Testing Locating Elements From HTML Selenium with Python from selenium import webdriver # note! this file name cannot be selenium.py because this is not the library PATH = "D:\Python36-32\chromedriver.exe" driver = webdriver.Chrome(PATH) driver.get("https://williamkpchan.github.io/LibDocs/python%20notes.html") #driver.close() # this close the tab only if more than on tab on browser print(driver.title) driver.quit() Tech with Tim sample: From selenium import webdriver from selenium.webdriver.common.Keys import Keys from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected conditions as EC import time PATH = "Program Files\Chromedriver.exe" driver = webdriver.Chrome(PATH) driver.get("https://techwithtim.net") print(driver.title) search = driver.find_element_by_name("s") search.send_keys("test") search.send_keys(Keys.RETURN) try: main = WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.ID, "main")) ) articles = mal.n.find_e/ements by tag name("article") for article in articles: header = article.find_element_by_class_name("entry-summary") print(header.text) finally: driver.quit()

Python call an external command

import subprocess subprocess.run(["ls", "-l"]) import os os.system("your command") stream = os.popen("some_command with args") subprocess.call(['ping', 'localhost']) print subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE).stdout.read() print os.popen("echo Hello World").read() return_code = subprocess.call("echo Hello World", shell=True) print subprocess.Popen("echo %s " % user_input, stdout=PIPE).stdout.read() import subprocess p = subprocess.Popen('ls', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) for line in p.stdout.readlines(): print line, retval = p.wait()

常用 Matplotlib 图的 Python 代码

# !pip install brewer2mpl import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns import warnings; warnings.filterwarnings(action='once') large = 22; med = 16; small = 12 params = {'axes.titlesize': large, 'legend.fontsize': med, 'figure.figsize': (16, 10), 'axes.labelsize': med, 'axes.titlesize': med, 'xtick.labelsize': med, 'ytick.labelsize': med, 'figure.titlesize': large} plt.rcParams.update(params) plt.style.use('seaborn-whitegrid') sns.set_style("white") %matplotlib inline # Version print(mpl.__version__) #> 3.0.0 print(sns.__version__) #> 0.9.0 1. 散点图 Scatteplot是用于研究两个变量之间关系的经典和基本图。 如果数据中有多个组,则可能需要以不同颜色可视化每个组。 在Matplotlib,你可以方便地使用。 # Import dataset midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv") # Prepare Data # Create as many colors as there are unique midwest['category'] categories = np.unique(midwest['category']) colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))] # Draw Plot for Each Category plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k') for i, category in enumerate(categories): plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s=20, c=colors[i], label=str(category)) # Decorations plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000), xlabel='Area', ylabel='Population') plt.xticks(fontsize=12); plt.yticks(fontsize=12) plt.title("Scatterplot of Midwest Area vs Population", fontsize=22) plt.legend(fontsize=12) plt.show() 2. 带边界的气泡图 有时,您希望在边界内显示一组点以强调其重要性。 在此示例中,您将从应该被环绕的数据帧中获取记录,并将其传递给下面的代码中描述的记录。 encircle() from matplotlib import patches from scipy.spatial import ConvexHull import warnings; warnings.simplefilter('ignore') sns.set_style("white") # Step 1: Prepare Data midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv") # As many colors as there are unique midwest['category'] categories = np.unique(midwest['category']) colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))] # Step 2: Draw Scatterplot with unique color for each category fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k') for i, category in enumerate(categories): plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5) # Step 3: Encircling # https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot def encircle(x,y, ax=None, **kw): if not ax: ax=plt.gca() p = np.c_[x,y] hull = ConvexHull(p) poly = plt.Polygon(p[hull.vertices,:], **kw) ax.add_patch(poly) # Select data to be encircled midwest_encircle_data = midwest.loc[midwest.state=='IN', :] # Draw polygon surrounding vertices encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1) encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5) # Step 4: Decorations plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000), xlabel='Area', ylabel='Population') plt.xticks(fontsize=12); plt.yticks(fontsize=12) plt.title("Bubble Plot with Encircling", fontsize=22) plt.legend(fontsize=12) plt.show() 3. 带线性回归最佳拟合线的散点图 如果你想了解两个变量如何相互改变,那么最合适的线就是要走的路。 下图显示了数据中各组之间最佳拟合线的差异。 要禁用分组并仅为整个数据集绘制一条最佳拟合线,请从下面的调用中删除该参数。 # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") df_select = df.loc[df.cyl.isin([4,8]), :] # Plot sns.set_style("white") gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select, height=7, aspect=1.6, robust=True, palette='tab10', scatter_kws=dict(s=60, linewidths=.7, edgecolors='black')) # Decorations gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50)) plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20) 每个回归线都在自己的列中 或者,您可以在其自己的列中显示每个组的最佳拟合线。 你可以通过在里面设置参数来实现这一点。 # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") df_select = df.loc[df.cyl.isin([4,8]), :] # Each line in its own column sns.set_style("white") gridobj = sns.lmplot(x="displ", y="hwy", data=df_select, height=7, robust=True, palette='Set1', col="cyl", scatter_kws=dict(s=60, linewidths=.7, edgecolors='black')) # Decorations gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50)) plt.show() 4. 抖动图 通常,多个数据点具有完全相同的X和Y值。 结果,多个点相互绘制并隐藏。 为避免这种情况,请稍微抖动点,以便您可以直观地看到它们。 这很方便使用 # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") # Draw Stripplot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5) # Decorations plt.title('Use jittered plots to avoid overlapping of points', fontsize=22) plt.show() 5. 计数图 避免点重叠问题的另一个选择是增加点的大小,这取决于该点中有多少点。 因此,点的大小越大,周围的点的集中度就越大。 # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") df_counts = df.groupby(['hwy', 'cty']).size().reset_index(name='counts') # Draw Stripplot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax) # Decorations plt.title('Counts Plot - Size of circle is bigger as more points overlap', fontsize=22) plt.show() 6. 边缘直方图 边缘直方图具有沿X和Y轴变量的直方图。 这用于可视化X和Y之间的关系以及单独的X和Y的单变量分布。 该图如果经常用于探索性数据分析(EDA)。 # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") # Create Fig and gridspec fig = plt.figure(figsize=(16, 10), dpi= 80) grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2) # Define the axes ax_main = fig.add_subplot(grid[:-1, :-1]) ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[]) ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[]) # Scatterplot on main ax ax_main.scatter('displ', 'hwy', s=df.cty*4, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="tab10", edgecolors='gray', linewidths=.5) # histogram on the right ax_bottom.hist(df.displ, 40, histtype='stepfilled', orientation='vertical', color='deeppink') ax_bottom.invert_yaxis() # histogram in the bottom ax_right.hist(df.hwy, 40, histtype='stepfilled', orientation='horizontal', color='deeppink') # Decorations ax_main.set(title='Scatterplot with Histograms displ vs hwy', xlabel='displ', ylabel='hwy') ax_main.title.set_fontsize(20) for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()): item.set_fontsize(14) xlabels = ax_main.get_xticks().tolist() ax_main.set_xticklabels(xlabels) plt.show() 7.边缘箱形图 边缘箱图与边缘直方图具有相似的用途。 然而,箱线图有助于精确定位X和Y的中位数,第25和第75百分位数。 # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv") # Create Fig and gridspec fig = plt.figure(figsize=(16, 10), dpi= 80) grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2) # Define the axes ax_main = fig.add_subplot(grid[:-1, :-1]) ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[]) ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[]) # Scatterplot on main ax ax_main.scatter('displ', 'hwy', s=df.cty*5, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="Set1", edgecolors='black', linewidths=.5) # Add a graph in each part sns.boxplot(df.hwy, ax=ax_right, orient="v") sns.boxplot(df.displ, ax=ax_bottom, orient="h") # Decorations ------------------ # Remove x axis name for the boxplot ax_bottom.set(xlabel='') ax_right.set(ylabel='') # Main Title, Xlabel and YLabel ax_main.set(title='Scatterplot with Histograms displ vs hwy', xlabel='displ', ylabel='hwy') # Set font size of different components ax_main.title.set_fontsize(20) for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()): item.set_fontsize(14) plt.show() 8. 相关图 Correlogram用于直观地查看给定数据帧(或2D数组)中所有可能的数值变量对之间的相关度量。 # Import Dataset df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") # Plot plt.figure(figsize=(12,10), dpi= 80) sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns, cmap='RdYlGn', center=0, annot=True) # Decorations plt.title('Correlogram of mtcars', fontsize=22) plt.xticks(fontsize=12) plt.yticks(fontsize=12) plt.show() 9. 矩阵图 成对图是探索性分析中的最爱,以理解所有可能的数字变量对之间的关系。 它是双变量分析的必备工具。 # Load Dataset df = sns.load_dataset('iris') # Plot plt.figure(figsize=(10,8), dpi= 80) sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5)) plt.show() # Load Dataset df = sns.load_dataset('iris') # Plot plt.figure(figsize=(10,8), dpi= 80) sns.pairplot(df, kind="reg", hue="species") plt.show() 偏差 10. 发散型条形图 如果您想根据单个指标查看项目的变化情况,并可视化此差异的顺序和数量,那么发散条是一个很好的工具。 它有助于快速区分数据中组的性能,并且非常直观,并且可以立即传达这一点。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,10), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5) # Decorations plt.gca().set(ylabel='$Model$', xlabel='$Mileage$') plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show() 11. 发散型文本 分散的文本类似于发散条,如果你想以一种漂亮和可呈现的方式显示图表中每个项目的价值,它更喜欢。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,14), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 2), horizontalalignment='right' if x < 0 else 'left', verticalalignment='center', fontdict={'color':'red' if x < 0 else 'green', 'size':14}) # Decorations plt.yticks(df.index, df.cars, fontsize=12) plt.title('Diverging Text Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show() 12. 发散型包点图 发散点图也类似于发散条。 然而,与发散条相比,条的不存在减少了组之间的对比度和差异。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = ['red' if x < 0 else 'darkgreen' for x in df['mpg_z']] df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot plt.figure(figsize=(14,16), dpi= 80) plt.scatter(df.mpg_z, df.index, s=450, alpha=.6, color=df.colors) for x, y, tex in zip(df.mpg_z, df.index, df.mpg_z): t = plt.text(x, y, round(tex, 1), horizontalalignment='center', verticalalignment='center', fontdict={'color':'white'}) # Decorations # Lighten borders plt.gca().spines["top"].set_alpha(.3) plt.gca().spines["bottom"].set_alpha(.3) plt.gca().spines["right"].set_alpha(.3) plt.gca().spines["left"].set_alpha(.3) plt.yticks(df.index, df.cars) plt.title('Diverging Dotplot of Car Mileage', fontdict={'size':20}) plt.xlabel('$Mileage$') plt.grid(linestyle='--', alpha=0.5) plt.xlim(-2.5, 2.5) plt.show() 13. 带标记的发散型棒棒糖图 带标记的棒棒糖通过强调您想要引起注意的任何重要数据点并在图表中适当地给出推理,提供了一种可视化分歧的灵活方式。 # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv") x = df.loc[:, ['mpg']] df['mpg_z'] = (x - x.mean())/x.std() df['colors'] = 'black' # color fiat differently df.loc[df.cars == 'Fiat X1-9', 'colors'] = 'darkorange' df.sort_values('mpg_z', inplace=True) df.reset_index(inplace=True) # Draw plot import matplotlib.patches as patches plt.figure(figsize=(14,16), dpi= 80) plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=1) plt.scatter(df.mpg_z, df.index, color=df.colors, s=[600 if x == 'Fiat X1-9' else 300 for x in df.cars], alpha=0.6) plt.yticks(df.index, df.cars) plt.xticks(fontsize=12) # Annotate plt.annotate('Mercedes Models', xy=(0.0, 11.0), xytext=(1.0, 11), xycoords='data', fontsize=15, ha='center', va='center', bbox=dict(boxstyle='square', fc='firebrick'), arrowprops=dict(arrowstyle='-[, widthB=2.0, lengthB=1.5', lw=2.0, color='steelblue'), color='white') # Add Patches p1 = patches.Rectangle((-2.0, -1), width=.3, height=3, alpha=.2, facecolor='red') p2 = patches.Rectangle((1.5, 27), width=.8, height=5, alpha=.2, facecolor='green') plt.gca().add_patch(p1) plt.gca().add_patch(p2) # Decorate plt.title('Diverging Bars of Car Mileage', fontdict={'size':20}) plt.grid(linestyle='--', alpha=0.5) plt.show() 14.面积图 通过对轴和线之间的区域进行着色,区域图不仅强调峰值和低谷,而且还强调高点和低点的持续时间。 高点持续时间越长,线下面积越大。 import numpy as np import pandas as pd # Prepare Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=['date']).head(100) x = np.arange(df.shape[0]) y_returns = (df.psavert.diff().fillna(0)/df.psavert.shift(1)).fillna(0) * 100 # Plot plt.figure(figsize=(16,10), dpi= 80) plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] >= 0, facecolor='green', interpolate=True, alpha=0.7) plt.fill_between(x[1:], y_returns[1:], 0, where=y_returns[1:] <= 0, facecolor='red', interpolate=True, alpha=0.7) # Annotate plt.annotate('Peak 1975', xy=(94.0, 21.0), xytext=(88.0, 28), bbox=dict(boxstyle='square', fc='firebrick'), arrowprops=dict(facecolor='steelblue', shrink=0.05), fontsize=15, color='white') # Decorations xtickvals = [str(m)[:3].upper()+"-"+str(y) for y,m in zip(df.date.dt.year, df.date.dt.month_name())] plt.gca().set_xticks(x[::6]) plt.gca().set_xticklabels(xtickvals[::6], rotation=90, fontdict={'horizontalalignment': 'center', 'verticalalignment': 'center_baseline'}) plt.ylim(-35,35) plt.xlim(1,100) plt.title("Month Economics Return %", fontsize=22) plt.ylabel('Monthly returns %') plt.grid(alpha=0.5) plt.show() 15. 有序条形图 有序条形图有效地传达了项目的排名顺序。 但是,在图表上方添加度量标准的值,用户可以从图表本身获取精确信息。 # Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', inplace=True) df.reset_index(inplace=True) # Draw plot import matplotlib.patches as patches fig, ax = plt.subplots(figsize=(16,10), facecolor='white', dpi= 80) ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=20) # Annotate Text for i, cty in enumerate(df.cty): ax.text(i, cty+0.5, round(cty, 1), horizontalalignment='center') # Title, Label, Ticks and Ylim ax.set_title('Bar Chart for Highway Mileage', fontdict={'size':22}) ax.set(ylabel='Miles Per Gallon', ylim=(0, 30)) plt.xticks(df.index, df.manufacturer.str.upper(), rotation=60, horizontalalignment='right', fontsize=12) # Add patches to color the X axis labels p1 = patches.Rectangle((.57, -0.005), width=.33, height=.13, alpha=.1, facecolor='green', transform=fig.transFigure) p2 = patches.Rectangle((.124, -0.005), width=.446, height=.13, alpha=.1, facecolor='red', transform=fig.transFigure) fig.add_artist(p1) fig.add_artist(p2) plt.show() 16. 棒棒糖图 棒棒糖图表以一种视觉上令人愉悦的方式提供与有序条形图类似的目的。 # Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', inplace=True) df.reset_index(inplace=True) # Draw plot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) ax.vlines(x=df.index, ymin=0, ymax=df.cty, color='firebrick', alpha=0.7, linewidth=2) ax.scatter(x=df.index, y=df.cty, s=75, color='firebrick', alpha=0.7) # Title, Label, Ticks and Ylim ax.set_title('Lollipop Chart for Highway Mileage', fontdict={'size':22}) ax.set_ylabel('Miles Per Gallon') ax.set_xticks(df.index) ax.set_xticklabels(df.manufacturer.str.upper(), rotation=60, fontdict={'horizontalalignment': 'right', 'size':12}) ax.set_ylim(0, 30) # Annotate for row in df.itertuples(): ax.text(row.Index, row.cty+.5, s=round(row.cty, 2), horizontalalignment= 'center', verticalalignment='bottom', fontsize=14) plt.show() 17. 包点图 点图表传达了项目的排名顺序。 由于它沿水平轴对齐,因此您可以更容易地看到点彼此之间的距离。 # Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', inplace=True) df.reset_index(inplace=True) # Draw plot fig, ax = plt.subplots(figsize=(16,10), dpi= 80) ax.hlines(y=df.index, xmin=11, xmax=26, color='gray', alpha=0.7, linewidth=1, linestyles='dashdot') ax.scatter(y=df.index, x=df.cty, s=75, color='firebrick', alpha=0.7) # Title, Label, Ticks and Ylim ax.set_title('Dot Plot for Highway Mileage', fontdict={'size':22}) ax.set_xlabel('Miles Per Gallon') ax.set_yticks(df.index) ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}) ax.set_xlim(10, 27) plt.show() 18. 坡度图 斜率图最适合比较给定人/项目的“之前”和“之后”位置。 import matplotlib.lines as mlines # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv") left_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1952'])] right_label = [str(c) + ', '+ str(round(y)) for c, y in zip(df.continent, df['1957'])] klass = ['red' if (y1-y2) < 0 else 'green' for y1, y2 in zip(df['1952'], df['1957'])] # draw line # https://stackoverflow.com/questions/36470343/how-to-draw-a-line-with-matplotlib/36479941 def newline(p1, p2, color='black'): ax = plt.gca() l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='red' if p1[1]-p2[1] > 0 else 'green', marker='o', markersize=6) ax.add_line(l) return l fig, ax = plt.subplots(1,1,figsize=(14,14), dpi= 80) # Vertical Lines ax.vlines(x=1, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted') ax.vlines(x=3, ymin=500, ymax=13000, color='black', alpha=0.7, linewidth=1, linestyles='dotted') # Points ax.scatter(y=df['1952'], x=np.repeat(1, df.shape[0]), s=10, color='black', alpha=0.7) ax.scatter(y=df['1957'], x=np.repeat(3, df.shape[0]), s=10, color='black', alpha=0.7) # Line Segmentsand Annotation for p1, p2, c in zip(df['1952'], df['1957'], df['continent']): newline([1,p1], [3,p2]) ax.text(1-0.05, p1, c + ', ' + str(round(p1)), horizontalalignment='right', verticalalignment='center', fontdict={'size':14}) ax.text(3+0.05, p2, c + ', ' + str(round(p2)), horizontalalignment='left', verticalalignment='center', fontdict={'size':14}) # 'Before' and 'After' Annotations ax.text(1-0.05, 13000, 'BEFORE', horizontalalignment='right', verticalalignment='center', fontdict={'size':18, 'weight':700}) ax.text(3+0.05, 13000, 'AFTER', horizontalalignment='left', verticalalignment='center', fontdict={'size':18, 'weight':700}) # Decoration ax.set_title("Slopechart: Comparing GDP Per Capita between 1952 vs 1957", fontdict={'size':22}) ax.set(xlim=(0,4), ylim=(0,14000), ylabel='Mean GDP Per Capita') ax.set_xticks([1,3]) ax.set_xticklabels(["1952", "1957"]) plt.yticks(np.arange(500, 13000, 2000), fontsize=12) # Lighten borders plt.gca().spines["top"].set_alpha(.0) plt.gca().spines["bottom"].set_alpha(.0) plt.gca().spines["right"].set_alpha(.0) plt.gca().spines["left"].set_alpha(.0) plt.show() 19. 哑铃图 哑铃图传达各种项目的“前”和“后”位置以及项目的排序。 如果您想要将特定项目/计划对不同对象的影响可视化,那么它非常有用。 import matplotlib.lines as mlines # Import Data df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/health.csv") df.sort_values('pct_2014', inplace=True) df.reset_index(inplace=True) # Func to draw line segment def newline(p1, p2, color='black'): ax = plt.gca() l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='skyblue') ax.add_line(l) return l # Figure and Axes fig, ax = plt.subplots(1,1,figsize=(14,14), facecolor='#f7f7f7', dpi= 80) # Vertical Lines ax.vlines(x=.05, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines(x=.10, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines(x=.15, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') ax.vlines(x=.20, ymin=0, ymax=26, color='black', alpha=1, linewidth=1, linestyles='dotted') # Points ax.scatter(y=df['index'], x=df['pct_2013'], s=50, color='#0e668b', alpha=0.7) ax.scatter(y=df['index'], x=df['pct_2014'], s=50, color='#a3c4dc', alpha=0.7) # Line Segments for i, p1, p2 in zip(df['index'], df['pct_2013'], df['pct_2014']): newline([p1, i], [p2, i]) # Decoration ax.set_facecolor('#f7f7f7') ax.set_title("Dumbell Chart: Pct Change - 2013 vs 2014", fontdict={'size':22}) ax.set(xlim=(0,.25), ylim=(-1, 27), ylabel='Mean GDP Per Capita') ax.set_xticks([.05, .1, .15, .20]) ax.set_xticklabels(['5%', '15%', '20%', '25%']) ax.set_xticklabels(['5%', '15%', '20%', '25%']) plt.show() 20. 连续变量的直方图 直方图显示给定变量的频率分布。 下面的表示基于分类变量对频率条进行分组,从而更好地了解连续变量和串联变量。 # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Prepare data x_var = 'displ' groupby_var = 'class' df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var) vals = [df[x_var].values.tolist() for i, df in df_agg] # Draw plt.figure(figsize=(16,9), dpi= 80) colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))] n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, color=colors[:len(vals)]) # Decoration plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])}) plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22) plt.xlabel(x_var) plt.ylabel("Frequency") plt.ylim(0, 25) plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]]) plt.show() 21. 类型变量的直方图 分类变量的直方图显示该变量的频率分布。 通过对条形图进行着色,您可以将分布与表示颜色的另一个分类变量相关联。 # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Prepare data x_var = 'manufacturer' groupby_var = 'class' df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var) vals = [df[x_var].values.tolist() for i, df in df_agg] # Draw plt.figure(figsize=(16,9), dpi= 80) colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))] n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=False, color=colors[:len(vals)]) # Decoration plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])}) plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22) plt.xlabel(x_var) plt.ylabel("Frequency") plt.ylim(0, 40) plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=90, horizontalalignment='left') plt.show() 22. 密度图 密度图是一种常用工具,可视化连续变量的分布。 通过“响应”变量对它们进行分组,您可以检查X和Y之间的关系。 以下情况,如果出于代表性目的来描述城市里程的分布如何随着汽缸数的变化而变化。 # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Draw Plot plt.figure(figsize=(16,10), dpi= 80) sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=.7) sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.7) sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.7) sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.7) # Decoration plt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22) plt.legend() 23. 直方密度线图 带有直方图的密度曲线将两个图表传达的集体信息汇集在一起,这样您就可以将它们放在一个图形而不是两个图形中。 # Import Data df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Draw Plot plt.figure(figsize=(13,10), dpi= 80) sns.distplot(df.loc[df['class'] == 'compact', "cty"], color="dodgerblue", label="Compact", hist_kws={'alpha':.7}, kde_kws={'linewidth':3}) sns.distplot(df.loc[df['class'] == 'suv', "cty"], color="orange", label="SUV", hist_kws={'alpha':.7}, kde_kws={'linewidth':3}) sns.distplot(df.loc[df['class'] == 'minivan', "cty"], color="g", label="minivan", hist_kws={'alpha':.7}, kde_kws={'linewidth':3}) plt.ylim(0, 0.35) # Decoration plt.title('Density Plot of City Mileage by Vehicle Type', fontsize=22) plt.legend() plt.show() 24. Joy Plot Joy Plot允许不同组的密度曲线重叠,这是一种可视化相对于彼此的大量组的分布的好方法。 它看起来很悦目,并清楚地传达了正确的信息。 它可以使用joypy基于的包来轻松构建matplotlib。 # !pip install joypy # Import Data mpg = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # Draw Plot plt.figure(figsize=(16,10), dpi= 80) fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by="class", ylim='own', figsize=(14,10)) # Decoration plt.title('Joy Plot of City and Highway Mileage by Class', fontsize=22) plt.show() 25. 分布式点图 分布点图显示按组分割的点的单变量分布。 点数越暗,该区域的数据点集中度越高。 通过对中位数进行不同着色,组的真实定位立即变得明显。 import matplotlib.patches as mpatches # Prepare Data df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") cyl_colors = {4:'tab:red', 5:'tab:green', 6:'tab:blue', 8:'tab:orange'} df_raw['cyl_color'] = df_raw.cyl.map(cyl_colors) # Mean and Median city mileage by make df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean()) df.sort_values('cty', ascending=False, inplace=True) df.reset_index(inplace=True) df_median = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.median()) # Draw horizontal lines fig, ax = plt.subplots(figsize=(16,10), dpi= 80) ax.hlines(y=df.index, xmin=0, xmax=40, color='gray', alpha=0.5, linewidth=.5, linestyles='dashdot') # Draw the Dots for i, make in enumerate(df.manufacturer): df_make = df_raw.loc[df_raw.manufacturer==make, :] ax.scatter(y=np.repeat(i, df_make.shape[0]), x='cty', data=df_make, s=75, edgecolors='gray', c='w', alpha=0.5) ax.scatter(y=i, x='cty', data=df_median.loc[df_median.index==make, :], s=75, c='firebrick') # Annotate ax.text(33, 13, "$red ; dots ; are ; the : median$", fontdict={'size':12}, color='firebrick') # Decorations red_patch = plt.plot([],[], marker="o", ms=10, ls="", mec=None, color='firebrick', label="Median") plt.legend(handles=red_patch) ax.set_title('Distribution of City Mileage by Make', fontdict={'size':22}) ax.set_xlabel('Miles Per Gallon (City)', alpha=0.7) ax.set_yticks(df.index) ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}, alpha=0.7) ax.set_xlim(1, 40) plt.xticks(alpha=0.7) plt.gca().spines["top"].set_visible(False) plt.gca().spines["bottom"].set_visible(False) plt.gca().spines["right"].set_visible(False) plt.gca().spines["left"].set_visible(False) plt.grid(axis='both', alpha=.4, linewidth=.1) plt.show()

Machine Learning Project Walk-Through in Python: Part One

Reading through a data science book or taking a course, it can feel like you have the individual pieces, but don’t quite know how to put them together. Taking the next step and solving a complete machine learning problem can be daunting, but preserving and completing a first project will give you the confidence to tackle any data science problem. This series of articles will walk through a complete machine learning solution with a real-world dataset to let you see how all the pieces come together. We’ll follow the general machine learning workflow step-by-step:
  • Data cleaning and formatting
  • Exploratory data analysis
  • Feature engineering and selection
  • Compare several machine learning models on a performance metric
  • Perform hyperparameter tuning on the best model
  • Evaluate the best model on the testing set
  • Interpret the model results
  • Draw conclusions and document work
  • Along the way, we’ll see how each step flows into the next and how to specifically implement each part in Python. The complete project is available on GitHub, with the first notebook here. This first article will cover steps 1–3 with the rest addressed in subsequent posts. (As a note, this problem was originally given to me as an “assignment” for a job screen at a start-up. After completing the work, I was offered the job, but then the CTO of the company quit and they weren’t able to bring on any new employees. I guess that’s how things go on the start-up scene!)

    Problem Definition

    The first step before we get coding is to understand the problem we are trying to solve and the available data. In this project, we will work with publicly available building energy data from New York City. The objective is to use the energy data to build a model that can predict the Energy Star Score of a building and interpret the results to find the factors influence the score. The data includes the Energy Star Score, makes this a supervised regression machine learning task: We want to develop a model that is both accurate — it can predict the Energy Star Score close to the true value — and interpretable — we can understand the model predictions. Once we know the goal, we can use it to guide our decisions as we dig into the data and build models.

    Data Cleaning

    Contrary to what most data science courses would have you believe, not every dataset is a perfectly curated group of observations with no missing values or anomalies (looking at you mtcars and iris datasets). Real-world data is messy means we need to clean and wrangle it into an acceptable format before we can even start the analysis. Data cleaning is an un-glamorous, but necessary part of most actual data science problems. First, we can load in the data as a Pandas DataFrame and take a look: import pandas as pd import numpy as np # Read in data into a dataframe data = pd.read_csv('data/Energy_and_Water_Data_Disclosure_for_Local_Law_84_2017__Data_for_Calendar_Year_2016_.csv') # Display top of dataframe data.head() What Actual Data Looks Like! This is a subset of the full data contains 60 columns. Already, we can see a couple issues: first, we know that we want to predict the ENERGY STAR Score but we don’t know what any of the columns mean. While this isn’t necessarily an issue — we can often make an accurate model without any knowledge of the variables — we want to focus on interpretability, and it might be important to understand at least some of the columns. When I originally got the assignment from the start-up, I didn’t want to ask what all the column names meant, so I looked at the name of the file, and decided to search for “Local Law 84”. That led me to this page explains this is an NYC law requiring all buildings of a certain size to report their energy use. More searching brought me to all the definitions of the columns. Maybe looking at a file name is an obvious place to start, but for me this was a reminder to go slow so you don’t miss anything important! We don’t need to study all of the columns, but we should at least understand the Energy Star Score, is described as: A 1-to-100 percentile ranking based on self-reported energy usage for the reporting year. The Energy Star score is a relative measure used for comparing the energy efficiency of buildings. That clears up the first problem, but the second issue is that missing values are encoded as “Not Available”. This is a string in Python means that even the columns with numbers will be stored as object datatypes because Pandas converts a column with any strings into a column of all strings. We can see the datatypes of the columns using the dataframe.info()method: # See the column data types and non-missing values data.info() Sure enough, some of the columns that clearly contain numbers (such as ft²), are stored as objects. We can’t do numerical analysis on strings, so these will have to be converted to number (specifically float) data types! Here’s a little Python code that replaces all the “Not Available” entries with not a number ( np.nan), can be interpreted as numbers, and then converts the relevant columns to the float datatype: Once the correct columns are numbers, we can start to investigate the data.

    Missing Data and Outliers

    In addition to incorrect datatypes, another common problem when dealing with real-world data is missing values. These can arise for many reasons and have to be either filled in or removed before we train a machine learning model. First, let’s get a sense of how many missing values are in each column (see the notebook for code). (To create this table, I used a function from this Stack Overflow Forum). While we always want to be careful about removing information, if a column has a high percentage of missing values, then it probably will not be useful to our model. The threshold for removing columns should depend on the problem (here is a discussion), and for this project, we will remove any columns with more than 50% missing values. At this point, we may also want to remove outliers. These can be due to typos in data entry, mistakes in units, or they could be legitimate but extreme values. For this project, we will remove anomalies based on the definition of extreme outliers: (For the code to remove the columns and the anomalies, see the notebook). At the end of the data cleaning and anomaly removal process, we are left with over 11,000 buildings and 49 features.

    Exploratory Data Analysis

    Now that the tedious — but necessary — step of data cleaning is complete, we can move on to exploring our data! Exploratory Data Analysis (EDA) is an open-ended process where we calculate statistics and make figures to find trends, anomalies, patterns, or relationships within the data. In short, the goal of EDA is to learn what our data can tell us. It generally starts out with a high level overview, then narrows in to specific areas as we find interesting parts of the data. The findings may be interesting in their own right, or they can be used to inform our modeling choices, such as by helping us decide features to use.

    Single Variable Plots

    The goal is to predict the Energy Star Score (renamed to score in our data) so a reasonable place to start is examining the distribution of this variable. A histogram is a simple yet effective way to visualize the distribution of a single variable and is easy to make using matplotlib. import matplotlib.pyplot as plt # Histogram of the Energy Star Score plt.style.use('fivethirtyeight') plt.hist(data['score'].dropna(), bins = 100, edgecolor = 'k'); plt.xlabel('Score'); plt.ylabel('Number of Buildings'); plt.title('Energy Star Score Distribution'); This looks quite suspicious! The Energy Star score is a percentile rank, means we would expect to see a uniform distribution, with each score assigned to the same number of buildings. However, a disproportionate number of buildings have either the highest, 100, or the lowest, 1, score (higher is better for the Energy Star score). If we go back to the definition of the score, we see that it is based on “self-reported energy usage” might explain the very high scores. Asking building owners to report their own energy usage is like asking students to report their own scores on a test! As a result, this probably is not the most objective measure of a building’s energy efficiency. If we had an unlimited amount of time, we might want to investigate why so many buildings have very high and very low scores we could by selecting these buildings and seeing what they have in common. However, our objective is only to predict the score and not to devise a better method of scoring buildings! We can make a note in our report that the scores have a suspect distribution, but our main focus in on predicting the score.

    Looking for Relationships

    A major part of EDA is searching for relationships between the features and the target. Variables that are correlated with the target are useful to a model because they can be used to predict the target. One way to examine the effect of a categorical variable (takes on only a limited set of values) on the target is through a density plot using the seaborn library. A density plot can be thought of as a smoothed histogram because it shows the distribution of a single variable. We can color a density plot by class to see how a categorical variable changes the distribution. The following code makes a density plot of the Energy Star Score colored by the the type of building (limited to building types with more than 100 data points): We can see that the building type has a significant impact on the Energy Star Score. Office buildings tend to have a higher score while Hotels have a lower score. This tells us that we should include the building type in our modeling because it does have an impact on the target. As a categorical variable, we will have to one-hot encode the building type. A similar plot can be used to show the Energy Star Score by borough: The borough does not seem to have as large of an impact on the score as the building type. Nonetheless, we might want to include it in our model because there are slight differences between the boroughs. To quantify relationships between variables, we can use the Pearson Correlation Coefficient. This is a measure of the strength and direction of a linear relationship between two variables. A score of +1 is a perfectly linear positive relationship and a score of -1 is a perfectly negative linear relationship. Several values of the correlation coefficient are shown below: Values of the Pearson Correlation Coefficient (Source) While the correlation coefficient cannot capture non-linear relationships, it is a good way to start figuring out how variables are related. In Pandas, we can easily calculate the correlations between any columns in a dataframe: # Find all correlations with the score and sort correlations_data = data.corr()['score'].sort_values() The most negative (left) and positive (right) correlations with the target: There are several strong negative correlations between the features and the target with the most negative the different categories of EUI (these measures vary slightly in how they are calculated). The EUI — Energy Use Intensity — is the amount of energy used by a building divided by the square footage of the buildings. It is meant to be a measure of the efficiency of a building with a lower score being better. Intuitively, these correlations make sense: as the EUI increases, the Energy Star Score tends to decrease.

    Two-Variable Plots

    To visualize relationships between two continuous variables, we use scatterplots. We can include additional information, such as a categorical variable, in the color of the points. For example, the following plot shows the Energy Star Score vs. Site EUI colored by the building type: This plot lets us visualize what a correlation coefficient of -0.7 looks like. As the Site EUI decreases, the Energy Star Score increases, a relationship that holds steady across the building types. The final exploratory plot we will make is known as the Pairs Plot. This is a great exploration tool because it lets us see relationships between multiple pairs of variables as well as distributions of single variables. Here we are using the seaborn visualization library and the PairGrid function to create a Pairs Plot with scatterplots on the upper triangle, histograms on the diagonal, and 2D kernel density plots and correlation coefficients on the lower triangle. To see interactions between variables, we look for where a row intersects with a column. For example, to see the correlation of Weather Norm EUI with score, we look in the Weather Norm EUI row and the score column and see a correlation coefficient of -0.67. In addition to looking cool, plots such as these can help us decide variables to include in modeling.

    Feature Engineering and Selection

    Feature engineering and selection often provide the greatest return on time invested in a machine learning problem. First of all, let’s define what these two tasks are: A machine learning model can only learn from the data we provide it, so ensuring that data includes all the relevant information for our task is crucial. If we don’t feed a model the correct data, then we are setting it up to fail and we should not expect it to learn! For this project, we will take the following feature engineering steps: One-hot encoding is necessary to include categorical variables in a model. A machine learning algorithm cannot understand a building type of “office”, so we have to record it as a 1 if the building is an office and a 0 otherwise. Adding transformed features can help our model learn non-linear relationships within the data. Taking the square root, natural log, or various powers of features is common practice in data science and can be based on domain knowledge or what works best in practice. Here we will include the natural log of all numerical features. The following code selects the numeric features, takes log transformations of these features, selects the two categorical features, one-hot encodes these features, and joins the two sets together. This seems like a lot of work, but it is relatively straightforward in Pandas! After this process we have over 11,000 observations (buildings) with 110 columns (features). Not all of these features are likely to be useful for predicting the Energy Star Score, so now we will turn to feature selection to remove some of the variables.

    Feature Selection

    Many of the 110 features we have in our data are redundant because they are highly correlated with one another. For example, here is a plot of Site EUI vs Weather Normalized Site EUI have a correlation coefficient of 0.997. Features that are strongly correlated with each other are known as collinear and removing one of the variables in these pairs of features can often help a machine learning model generalize and be more interpretable. (I should point out we are talking about correlations of features with other features, not correlations with the target, help our model!) There are a number of methods to calculate collinearity between features, with one of the most common the variance inflation factor. In this project, we will use thebcorrelation coefficient to identify and remove collinear features. We will drop one of a pair of features if the correlation coefficient between them is greater than 0.6. For the implementation, take a look at the notebook (and this Stack Overflow answer) While this value may seem arbitrary, I tried several different thresholds, and this choice yielded the best model. Machine learning is an empirical field and is often about experimenting and finding what performs best! After feature selection, we are left with 64 total features and 1 target. # Remove any columns with all na values features = features.dropna(axis=1, how = 'all') print(features.shape) (11319, 65)

    Establishing a Baseline

    We have now completed data cleaning, exploratory data analysis, and feature engineering. The final step to take before getting started with modeling is establishing a naive baseline. This is essentially a guess against we can compare our results. If the machine learning models do not beat this guess, then we might have to conclude that machine learning is not acceptable for the task or we might need to try a different approach. For regression problems, a reasonable naive baseline is to guess the median value of the target on the training set for all the examples in the test set. This sets a relatively low bar for any model to surpass. The metric we will use is mean absolute error (mae) measures the average absolute error on the predictions. There are many metrics for regression, but I like Andrew Ng’s advice to pick a single metric and then stick to it when evaluating models. The mean absolute error is easy to calculate and is interpretable. Before calculating the baseline, we need to split our data into a training and a testing set:
  • The training set of features is what we provide to our model during training along with the answers. The goal is for the model to learn a mapping between the features and the target.
  • The testing set of features is used to evaluate the trained model. The model is not allowed to see the answers for the testing set and must make predictions using only the features. We know the answers for the test set so we can compare the test predictions to the answers.
  • We will use 70% of the data for training and 30% for testing: # Split into 70% training and 30% testing set X, X_test, y, y_test = train_test_split(features, targets, test_size = 0.3, random_state = 42) Now we can calculate the naive baseline performance: The baseline guess is a score of 66.00 Baseline Performance on the test set: MAE = 24.5164 The naive estimate is off by about 25 points on the test set. The score ranges from 1–100, so this represents an error of 25%, quite a low bar to surpass!

    Conclusions

    In this article we walked through the first three steps of a machine learning problem. After defining the question, we:
  • Cleaned and formatted the raw data
  • Performed an exploratory data analysis to learn about the dataset
  • Developed a set of features that we will use for our models
  • Finally, we also completed the crucial step of establishing a baseline against we can judge our machine learning algorithms. The second post (available here) will show how to evaluate machine learning models using Scikit-Learn, select the best model, and perform hyperparameter tuning to optimize the model. The third post, dealing with model interpretation and reporting results, is here.

    Keras vs PyTorch 深度学习

    深度学习有很多框架和库。 这篇文章对两个流行库 Keras 和 Pytorch 进行了对比,因为二者都很容易上手,初学者能够轻松掌握。 我们同时用 Keras 和 PyTorch 训练一个简单的模型。 这两个工具最大的区别在于:PyTorch 默认为 eager 模式,而 Keras 基于 TensorFlow 和其他框架运行(现在主要是 TensorFlow),其默认模式为图模式。 最新版本的 TensorFlow 也提供类似 PyTorch 的 eager 模式,但是速度较慢。 如果你熟悉 NumPy,你可以将 PyTorch 视为有 GPU 支持的 NumPy。 此外,现在有多个具备高级 API(如 Keras)且以 PyTorch 为后端框架的库,如 Fastai、Lightning、Ignite 等。 如果你对它们感兴趣,那你选择 PyTorch 的理由就多了一个。 Keras 自带一些样本数据集,如 MNIST 手写数字数据集。 以上代码可以加载这些数据,数据集图像是 NumPy 数组格式。 Keras 还做了一点图像预处理,使数据适用于模型。 以上代码展示了模型。 在 Keras(TensorFlow)上,我们首先需要定义要使用的东西,然后立刻运行。 在 Keras 中,我们无法随时随地进行试验,不过 PyTorch 可以。 以上的代码用于训练和评估模型。 我们可以使用 save() 函数来保存模型,以便后续用 load_model() 函数加载模型。 predict() 函数则用来获取模型在测试数据上的输出。 现在我们概览了 Keras 基本模型实现过程,现在来看 PyTorch。 PyTorch 中的模型实现研究人员大多使用 PyTorch,因为它比较灵活,代码样式也是试验性的。 你可以在 PyTorch 中调整任何事,并控制全部,但控制也伴随着责任。 在 PyTorch 里进行试验是很容易的。 因为你不需要先定义好每一件事再运行。 我们能够轻松测试每一步。 因此,在 PyTorch 中 debug 要比在 Keras 中容易一些。 接下来,我们来看简单的数字识别模型实现。 以上代码导入了必需的库,并定义了一些变量。 n_epochs、momentum 等变量都是必须设置的超参数。 此处不讨论细节,我们的目的是理解代码的结构。 以上代码旨在声明用于加载训练所用批量数据的数据加载器。 下载数据有很多种方式,不受框架限制。 如果你刚开始学习深度学习,以上代码可能看起来比较复杂。 在此,我们定义了模型。 这是一种创建网络的通用方法。 我们扩展了 nn.Module,在前向传递中调用 forward() 函数。 PyTorch 的实现比较直接,且能够根据需要进行修改。 以上代码段定义了训练和测试函数。 在 Keras 中,我们需要调用 fit() 函数把这些事自动做完。 但是在 PyTorch 中,我们必须手动执行这些步骤。 像 Fastai 这样的高级 API 库会简化它,训练所需的代码也更少。 最后,保存和加载模型,以进行二次训练或预测。 这部分没有太多差别。 PyTorch 模型通常有 pt 或 pth 扩展。 关于框架选择的建议学会一种模型并理解其概念后,再转向另一种模型,并不是件难事,只是需要一些时间。 Colab 链接:PyTorch:https://colab.research.google.com/drive/1irYr0byhK6XZrImiY4nt9wX0fRp3c9mx?usp=sharing Keras:https://colab.research.google.com/drive/1QH6VOY_uOqZ6wjxP0K8anBAXmI0AwQCm?usp=sharing 原文链接:https://medium.com/@karan_jakhar/keras-vs-pytorch-dilemma-dc434e5b5ae0 自动化数据增强:实践、理论和新方向

    使用 CMake 构建 Python C/C++ 扩展

    众所周知,Python 语言的性能相比其他语言如 C/C++ 等要弱很多,所以当我们需要高性能的时候往往借助于 Python 的 C/C++ 扩展或者 Cython。 通常构建 Python C/C++ 扩展会使用 distutils 的 Extension 类,需要在 setup.py 中配置头文件包含路径 include_dirs、C/C++ 源文件路径 sources 等,比如下面这个Python 官方文档上的例子: from distutils.core import setup, Extension module1 = Extension( 'demo', define_macros=[ ('MAJOR_VERSION', '1'), ('MINOR_VERSION', '0') ], include_dirs=['/usr/local/include'], libraries =['tcl83'], library_dirs=['/usr/local/lib'], sources=['demo.c'] ) setup( name='PackageName', version='1.0', description='This is a demo package', author='Martin v. Loewis', author_email='martin@v.loewis.de', url='https://docs.python.org/extending/building', long_description=''' This is really just a demo package. ''', ext_modules=[module1] ) 这种方式对于绝大多数简单的项目应该是足够了,而当你需要用到一些 C/C++ 第三方库的时候可能会遇到因为某些原因需要将三方库的源码和项目源码一起进行编译的情况(比如 abseil-cpp),这个情况下往往会遇到 C/C++ 依赖管理的问题,CMake 则是常用的 C/C++ 依赖管理工具,本文将总结、分享一下使用 CMake 来构建 Python C/C++ 扩展的方案。 调研可选方案 首先来看一下 CMake 项目本身一般是如何构建的,一般 CMake 项目都会在项目根目录下有个 CMakeLists.txt 的 CMake 项目定义文件,构建方式通常如下: mkdir build cd build cmake .. make 那基本的思路就是在 Python 包构建过程中(pip install 或者 python setup.py install 等)调用上述命令完成扩展构建。通过 Google 搜索可以发现,一个方案是通过继承 distutils 的 Extension 来手工实现,另一个方案则是用别人写好的现成的封装库 scikit-build。 方案一 distutils CMake extension 这个方案有个现成的例子,pybind11 的 CMake 示例项目,BTW,pybind11 也是一个写 Python C++ 扩展的项目。看一下它的 setup.py 的代码: import os import re import sys import platform import subprocess from setuptools import setup, Extension from setuptools.command.build_ext import build_ext from distutils.version import LooseVersion class CMakeExtension(Extension): def __init__(self, name, sourcedir=''): Extension.__init__(self, name, sources=[]) self.sourcedir = os.path.abspath(sourcedir) class CMakeBuild(build_ext): def run(self): try: out = subprocess.check_output(['cmake', '--version']) except OSError: raise RuntimeError("CMake must be installed to build the following extensions: " + ", ".join(e.name for e in self.extensions)) if platform.system() == "Windows": cmake_version = LooseVersion(re.search(r'version\s*([\d.]+)', out.decode()).group(1)) if cmake_version < '3.1.0': raise RuntimeError("CMake >= 3.1.0 is required on Windows") for ext in self.extensions: self.build_extension(ext) def build_extension(self, ext): extdir = os.path.abspath(os.path.dirname(self.get_ext_fullpath(ext.name))) # required for auto-detection of auxiliary "native" libs if not extdir.endswith(os.path.sep): extdir += os.path.sep cmake_args = ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=' + extdir, '-DPYTHON_EXECUTABLE=' + sys.executable] cfg = 'Debug' if self.debug else 'Release' build_args = ['--config', cfg] if platform.system() == "Windows": cmake_args += ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}'.format(cfg.upper(), extdir)] if sys.maxsize > 2**32: cmake_args += ['-A', 'x64'] build_args += ['--', '/m'] else: cmake_args += ['-DCMAKE_BUILD_TYPE=' + cfg] build_args += ['--', '-j2'] env = os.environ.copy() env['CXXFLAGS'] = '{} -DVERSION_INFO=\\"{}\\"'.format(env.get('CXXFLAGS', ''), self.distribution.get_version()) if not os.path.exists(self.build_temp): os.makedirs(self.build_temp) subprocess.check_call(['cmake', ext.sourcedir] + cmake_args, cwd=self.build_temp, env=env) subprocess.check_call(['cmake', '--build', '.'] + build_args, cwd=self.build_temp) setup( name='cmake_example', version='0.0.1', author='Dean Moldovan', author_email='dean0x7d@gmail.com', description='A test project using pybind11 and CMake', long_description='', ext_modules=[CMakeExtension('cmake_example')], cmdclass=dict(build_ext=CMakeBuild), zip_safe=False, ) 可以看出,它通过重写 setuptools 的 build_ext cmdclass 在构建过程中调用了 cmake 命令完成扩展的构建。 这个方案比较适合 pybind11 的项目,因为它已经提供了很多 CMake 的 module 比如怎么找到 Python.h、libpython 等,打开示例项目的 CMakeLists.txt 可以发现它使用了一个 pybind11 提供的 CMake 函数 pybind11_add_module 来定义 Python 扩展,免去了很多繁琐的配置。 cmake_minimum_required(VERSION 2.8.12) project(cmake_example) add_subdirectory(pybind11) pybind11_add_module(cmake_example src/main.cpp) 如果不使用 pybind11 则比较麻烦,看看 Apache Arrow Python 包的 CMakeLists.txt 感受一下。 方案二 scikit-build scikit-build 是一个增强的 Python C/C++/Fortan/Cython 扩展构建系统生成器,本质上也是 Python setuptools 和 CMake 的胶水。 我们看一下 sciket-build 的 hello-cpp 示例: setup.py import sys from skbuild import setup # Require pytest-runner only when running tests pytest_runner = (['pytest-runner>=2.0,<3dev'] if any(arg in sys.argv for arg in ('pytest', 'test')) else []) setup_requires = pytest_runner setup( name="hello-cpp", version="1.2.3", description="a minimal example package (cpp version)", author='The scikit-build team', license="MIT", packages=['hello'], tests_require=['pytest'], setup_requires=setup_requires ) 基本上就是一个 setuptools.setup 的完整替代,不再使用 from setuptools import set 转用 from skbuild import setup。 CMakeLists.txt cmake_minimum_required(VERSION 3.4.0) project(hello) find_package(PythonExtensions REQUIRED) add_library(_hello MODULE hello/_hello.cxx) python_extension_module(_hello) install(TARGETS _hello LIBRARY DESTINATION hello) 这里没有看到类似上面 pybind11 CMake 示例中的 add_subdirectory(pybind11) 语句,而是直接用的 find_package(PythonExtensions REQUIRED) 和 python_extension_module CMake 函数: PythonExtensions 的 CMake 定义已经打包在 scikit-build 中 调用 skbuild.setup 的过程中 scikit-build 自动把它打包的 CMake 定义文件加载了所以上面才不需要像 pybind11 那样做 install(TARGETS _hello LIBRARY DESTINATION hello) 将构建好的扩展的动态链接库复制到 hello/ 目录中,从而可以在 Python 中使用 from hello._hello import hello 导入扩展中的 hello 函数 通常还会增加 pyproject.toml 来安装 pip 构建时候需要的依赖包: [build-system] requires = ["setuptools", "wheel", "scikit-build", "cmake", "ninja"] 比较有意思的是,scikit-build 并不需要你的系统上全局安装 CMake/Ninja,它打包了 manylinux 的 CMake 和 Ninja 的二进制 wheels 并发布到了 PyPi 上,cool. scikit-build 还支持类似的方式构建使用 Cython 和 pybind11 等的扩展,功能强大非常方便。 后记 最近工作中完成了使用 CMake 和 scikit-build 改造一个 C++ 和 Cython 写的 Python 扩展项目以便能够使用 abseil-cpp 的 Swiss Tables 优化性能,这篇文章差不多就是 brain dump 一下调研的过程,后面我想写一下如何在 Cython 中使用 abseil-cpp 的 containers 的文章,stay tuned. GENERATING C++ CODE USING PYTHON AND CMAKE Building and testing a hybrid Python/C++ package C++ Developer Guide Cython 在 Cython 项目中使用 abseil-cpp

    Python Exception Handling Using try, except and finally statement

    Exceptions in Python Python has many built-in exceptions that are raised when your program encounters an error. When these exceptions occur, the Python interpreter stops the current process and passes it to the calling process until it is handled. If not handled, the program will crash. For example, let us consider a program where we have a function A that calls function B, which in turn calls function C. If an exception occurs in function C but is not handled in C, the exception passes to B and then to A. If never handled, an error message is displayed and our program comes to a sudden unexpected halt. Catching Exceptions in Python In Python, exceptions can be handled using a try statement. The critical operation which can raise an exception is placed inside the try clause. The code that handles the exceptions is written in the except clause. We can thus choose what operations to perform once we have caught the exception. Here is a simple example. # import module sys to get the type of exception import sys randomList = ['a', 0, 2] for entry in randomList: try: print("The entry is", entry) r = 1/int(entry) break except: print("Oops!", sys.exc_info()[0], "occurred.") print("Next entry.") print() print("The reciprocal of", entry, "is", r) Output The entry is a Oops! <class 'ValueError'> occurred. Next entry. The entry is 0 Oops! <class 'ZeroDivisionError'> occured. Next entry. The entry is 2 The reciprocal of 2 is 0.5 In this program, we loop through the values of the randomList list. As previously mentioned, the portion that can cause an exception is placed inside the try block. If no exception occurs, the except block is skipped and normal flow continues(for last value). But if any exception occurs, it is caught by the except block (first and second values). Here, we print the name of the exception using the exc_info() function inside sys module. We can see that a causes ValueError and 0 causes ZeroDivisionError. Since every exception in Python inherits from the base Exception class, we can also perform the above task in the following way: # import module sys to get the type of exception import sys randomList = ['a', 0, 2] for entry in randomList: try: print("The entry is", entry) r = 1/int(entry) break except Exception as e: print("Oops!", e.__class__, "occurred.") print("Next entry.") print() print("The reciprocal of", entry, "is", r) This program has the same output as the above program. Catching Specific Exceptions in Python In the above example, we did not mention any specific exception in the except clause. This is not a good programming practice as it will catch all exceptions and handle every case in the same way. We can specify which exceptions an except clause should catch. A try clause can have any number of except clauses to handle different exceptions, however, only one will be executed in case an exception occurs. We can use a tuple of values to specify multiple exceptions in an except clause. Here is an example pseudo code. try: # do something pass except ValueError: # handle ValueError exception pass except (TypeError, ZeroDivisionError): # handle multiple exceptions # TypeError and ZeroDivisionError pass except: # handle all other exceptions pass Raising Exceptions in Python In Python programming, exceptions are raised when errors occur at runtime. We can also manually raise exceptions using the raise keyword. We can optionally pass values to the exception to clarify why that exception was raised. >>> raise KeyboardInterrupt Traceback (most recent call last): ... KeyboardInterrupt >>> raise MemoryError("This is an argument") Traceback (most recent call last): ... MemoryError: This is an argument >>> try: ... a = int(input("Enter a positive integer: ")) ... if a <= 0: ... raise ValueError("That is not a positive number!") ... except ValueError as ve: ... print(ve) ... Enter a positive integer: -2 That is not a positive number! Python try...finally The try statement in Python can have an optional finally clause. This clause is executed no matter what, and is generally used to release external resources. For example, we may be connected to a remote data center through the network or working with a file or a Graphical User Interface (GUI). In all these circumstances, we must clean up the resource before the program comes to a halt whether it successfully ran or not. These actions (closing a file, GUI or disconnecting from network) are performed in the finally clause to guarantee the execution. Here is an example of file operations to illustrate this. try: f = open("test.txt",encoding = 'utf-8') # perform file operations finally: f.close() This type of construct makes sure that the file is closed even if an exception occurs during the program execution.

    10 行以下 Python 高端操作

    一、生成二维码

    二维码作为一种信息传递的工具,在当今社会发挥了重要作用。而生成一个二维码也非常简单,在Python中我们可以通过MyQR模块了生成二维码,而生成一个二维码我们只需要2行代码,我们先安装MyQR模块,这里选用国内的源下载: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple/ myqr 安装完成后我们就可以开始写代码了: from MyQR import myqr # 注意大小写 myqr.run(words='http://www.baidu.com') # 如果为网站则会自动跳转,文本直接显示,不支持中文 我们执行代码后会在项目下生成一张二维码。当然我们还可以丰富二维码: from MyQR import myqr myqr.run( words='http://www.baidu.com', # 包含信息 picture='lbxx.jpg', # 背景图片 colorized=True, # 是否有颜色,如果为False则为黑白 save_name='code.png' # 输出文件名 ) 效果图如下: 另外MyQR还支持动态图片。

    二、生成词云

    词云是数据可视化的一种非常优美的方式,我们通过词云可以很直观的看出一些词语出现的频率高低。使用Python我们可以通过wordcloud模块生成词云,我们先安装wordcloud模块: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple/ wordcloud 然后我们就可以写代码了: from wordcloud import WordCloud wc = WordCloud() # 创建词云对象 wc.generate('Do not go gentle into that good night') # 生成词云 wc.to_file('wc.png') # 保存词云 执行代码后生成如下词云: 当然这只是最简单的词云,词云更详细的操作可以参见WordCloud生成卡卡西忍术词云[1]。

    三、批量抠图

    抠图的实现需要借助百度飞桨的深度学习工具paddlepaddle,我们需要安装两个模块就可以很快的实现批量抠图了,第一个是PaddlePaddle: python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple 还有一个是paddlehub模型库: pip install -i https://mirror.baidu.com/pypi/simple paddlehub 更详细的安装事项可以参见飞桨官网:https://www.paddlepaddle.org.cn/ 接下来我们只需要5行代码就能实现批量抠图: import os, paddlehub as hub humanseg = hub.Module(name='deeplabv3p_xception65_humanseg') # 加载模型 path = 'D:/CodeField/Workplace/PythonWorkplace/GrapImage/' # 文件目录 files = [path + i for i in os.listdir(path)] # 获取文件列表 results = humanseg.segmentation(data={'image':files}) # 抠图 抠图效果如下: 其中左边为原图,右边为抠图后填充黄色背景图。

    四、文字情绪识别

    在paddlepaddle面前,自然语言处理也变得非常简单。实现文字情绪识别我们同样需要安装PaddlePaddle和Paddlehub,具体安装参见三中内容。然后就是我们的代码部分了: import paddlehub as hub senta = hub.Module(name='senta_lstm') # 加载模型 sentence = [ # 准备要识别的语句 '你真美', '你真丑', '我好难过', '我不开心', '这个游戏好好玩', '什么垃圾游戏', ] results = senta.sentiment_classify(data={"text":sentence}) # 情绪识别 # 输出识别结果 for result in results: print(result) 识别的结果是一个字典列表: {'text': '你真美', 'sentiment_label': 1, 'sentiment_key': 'positive', 'positive_probs': 0.9602, 'negative_probs': 0.0398} {'text': '你真丑', 'sentiment_label': 0, 'sentiment_key': 'negative', 'positive_probs': 0.0033, 'negative_probs': 0.9967} {'text': '我好难过', 'sentiment_label': 1, 'sentiment_key': 'positive', 'positive_probs': 0.5324, 'negative_probs': 0.4676} {'text': '我不开心', 'sentiment_label': 0, 'sentiment_key': 'negative', 'positive_probs': 0.1936, 'negative_probs': 0.8064} {'text': '这个游戏好好玩', 'sentiment_label': 1, 'sentiment_key': 'positive', 'positive_probs': 0.9933, 'negative_probs': 0.0067} {'text': '什么垃圾游戏', 'sentiment_label': 0, 'sentiment_key': 'negative', 'positive_probs': 0.0108, 'negative_probs': 0.9892} 其中sentiment_key字段包含了情绪信息,详细分析可以参见Python自然语言处理只需要5行代码[2]。

    五、识别是否带了口罩

    这里同样是使用PaddlePaddle的产品,我们按照上面步骤安装好PaddlePaddle和Paddlehub,然后就开始写代码: import paddlehub as hub # 加载模型 module = hub.Module(name='pyramidbox_lite_mobile_mask') # 图片列表 image_list = ['face.jpg'] # 获取图片字典 input_dict = {'image':image_list} # 检测是否带了口罩 module.face_detection(data=input_dict) 执行上述程序后,项目下会生成detection_result文件夹,识别结果都会在里面,识别效果如下:

    六、简易信息轰炸

    Python控制输入设备的方式有很多种,我们可以通过win32或者pynput模块。我们可以通过简单的循环操作来达到信息轰炸的效果,这里以pynput为例,我们需要先安装模块: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple/ pynput 在写代码之前我们需要手动获取输入框的坐标: from pynput import mouse # 创建一个鼠标 m_mouse = mouse.Controller() # 输出鼠标位置 print(m_mouse.position) 可能有更高效的方法,但是我不会。 获取后我们就可以记录这个坐标,消息窗口不要移动。然后我们执行下列代码并将窗口切换至消息页面: import time from pynput import mouse, keyboard time.sleep(5) m_mouse = mouse.Controller() # 创建一个鼠标 m_keyboard = keyboard.Controller() # 创建一个键盘 m_mouse.position = (850, 670) # 将鼠标移动到指定位置 m_mouse.click(mouse.Button.left) # 点击鼠标左键 while(True): m_keyboard.type('你好') # 打字 m_keyboard.press(keyboard.Key.enter) # 按下enter m_keyboard.release(keyboard.Key.enter) # 松开enter time.sleep(0.5) # 等待 0.5秒 我承认,这个超过了10行代码,而且也不高端。 使用前QQ给小号发信息效果如下:

    七、识别图片中的文字

    我们可以通过Tesseract来识别图片中的文字,在Python中实现起来非常简单,但是前期下载文件、配置环境变量等稍微有些繁琐,所以本文只展示代码: import pytesseract from PIL import Image img = Image.open('text.jpg') text = pytesseract.image_to_string(img) print(text) 其中text就是识别出来的文本。如果对准确率不满意的话,还可以使用百度的通用文字接口。

    八、绘制函数图像

    图标是数据可视化的重要工具,在Python中matplotlib在数据可视化中发挥重要作用,下面我们来看看使用matplotlib如何绘制一个函数图像: import numpy as np from matplotlib import pyplot as plt x = np.arange(1,11) # x轴数据 y = x * x + 5 # 函数关系 plt.title("y=x*x+5") # 图像标题 plt.xlabel("x") # x轴标签 plt.ylabel("y") # y轴标签 plt.plot(x,y) # 生成图像 plt.show() # 显示图像 生成图像如下:

    九、人工智能

    下面给大家介绍的是独家的AI人工智能,一般不外传的。这个人工智能可以回答许多问题,当然人工智能现在还在发展阶段,想要理解人类的语言还差很多。废话不多说,下面来看看我们的人工智能Fdj: while(True): question = input() answer = question.replace('吗', '呢') answer = answer.replace('?', '!') print(answer) 下面我们来看看简单的测试: 你好吗? 我好呢! 你吃饭了吗? 我吃饭了呢! 你要睡了吗? 我要睡了呢! 看来我们“小复”还是比较智能的。

    References

    [1] WordCloud生成卡卡西忍术词云: https://blog.csdn.net/ZackSock/article/details/103517841

    Density and Contour Plots

    Sometimes it is useful to display three-dimensional data in two dimensions using contours or color-coded regions. There are three Matplotlib functions that can be helpful for this task: plt.contour for contour plots, plt.contourf for filled contour plots, and plt.imshow for showing images. This section looks at several examples of using these. We'll start by setting up the notebook for plotting and importing the functions we will use: In [1]: %matplotlib inline import matplotlib.pyplot as plt plt.style.use('seaborn-white') import numpy as np

    Visualizing a Three-Dimensional Function

    We'll start by demonstrating a contour plot using a function $z = f(x, y)$, using the following particular choice for $f$ (we've seen this before in Computation on Arrays: Broadcasting, when we used it as a motivating example for array broadcasting): In [2]: def f(x, y): return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x) A contour plot can be created with the plt.contour function. It takes three arguments: a grid of x values, a grid of y values, and a grid of z values. The x and y values represent positions on the plot, and the z values will be represented by the contour levels. Perhaps the most straightforward way to prepare such data is to use the np.meshgrid function, which builds two-dimensional grids from one-dimensional arrays: In [3]: x = np.linspace(0, 5, 50) y = np.linspace(0, 5, 40) X, Y = np.meshgrid(x, y) Z = f(X, Y) Now let's look at this with a standard line-only contour plot: In [4]: plt.contour(X, Y, Z, colors='black'); Notice that by default when a single color is used, negative values are represented by dashed lines, and positive values by solid lines. Alternatively, the lines can be color-coded by specifying a colormap with the cmap argument. Here, we'll also specify that we want more lines to be drawn—20 equally spaced intervals within the data range: In [5]: plt.contour(X, Y, Z, 20, cmap='RdGy'); Here we chose the RdGy (short for Red-Gray) colormap, which is a good choice for centered data. Matplotlib has a wide range of colormaps available, which you can easily browse in IPython by doing a tab completion on the plt.cm module: plt.cm.<TAB> Our plot is looking nicer, but the spaces between the lines may be a bit distracting. We can change this by switching to a filled contour plot using the plt.contourf() function (notice the f at the end), which uses largely the same syntax as plt.contour(). Additionally, we'll add a plt.colorbar() command, which automatically creates an additional axis with labeled color information for the plot: In [6]: plt.contourf(X, Y, Z, 20, cmap='RdGy') plt.colorbar(); The colorbar makes it clear that the black regions are "peaks," while the red regions are "valleys." One potential issue with this plot is that it is a bit "splotchy." That is, the color steps are discrete rather than continuous, which is not always what is desired. This could be remedied by setting the number of contours to a very high number, but this results in a rather inefficient plot: Matplotlib must render a new polygon for each step in the level. A better way to handle this is to use the plt.imshow() function, which interprets a two-dimensional grid of data as an image. The following code shows this: In [7]: plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower', cmap='RdGy') plt.colorbar() plt.axis(aspect='image'); There are a few potential gotchas with imshow(), however: Finally, it can sometimes be useful to combine contour plots and image plots. For example, here we'll use a partially transparent background image (with transparency set via the alpha parameter) and overplot contours with labels on the contours themselves (using the plt.clabel() function): In [8]: contours = plt.contour(X, Y, Z, 3, colors='black') plt.clabel(contours, inline=True, fontsize=8) plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower', cmap='RdGy', alpha=0.5) plt.colorbar(); The combination of these three functions—plt.contour, plt.contourf, and plt.imshow—gives nearly limitless possibilities for displaying this sort of three-dimensional data within a two-dimensional plot. For more information on the options available in these functions, refer to their docstrings. If you are interested in three-dimensional visualizations of this type of data, see Three-dimensional Plotting in Matplotlib. Density Contours Example simple contour plot import numpy as np from matplotlib.colors import LogNorm from matplotlib import pyplot as plt plt.interactive(True) fig=plt.figure(1) plt.clf() # generate input data; you already have that x1 = np.random.normal(0,10,100000) y1 = np.random.normal(0,7,100000)/10. x2 = np.random.normal(-15,7,100000) y2 = np.random.normal(-10,10,100000)/10. x=np.concatenate([x1,x2]) y=np.concatenate([y1,y2]) # calculate the 2D density of the data given counts,xbins,ybins=np.histogram2d(x,y,bins=100,normed=LogNorm()) # make the contour plot plt.contour(counts.transpose(),extent=[xbins.min(),xbins.max(), ybins.min(),ybins.max()],linewidths=3,colors='black', linestyles='solid') plt.show() produces a nice contour plot. The contour function offers a lot of fancy adjustments, for example let's set the levels by hand: plt.clf() mylevels=[1.e-4, 1.e-3, 1.e-2] plt.contour(counts.transpose(),mylevels,extent=[xbins.min(),xbins.max(), ybins.min(),ybins.max()],linewidths=3,colors='black', linestyles='solid') plt.show() producing this plot: contour plot with adjusted levels And finally, in SM one can do contour plots on linear and log scales, so I spent a little time trying to figure out how to do this in matplotlib. Here is an example when the y points need to be plotted on the log scale and the x points still on the linear scale: plt.clf() # this is our new data which ought to be plotted on the log scale ynew=10**y # but the binning needs to be done in linear space counts,xbins,ybins=np.histogram2d(x,y,bins=100,normed=LogNorm()) mylevels=[1.e-4,1.e-3,1.e-2] # and the plotting needs to be done in the data (i.e., exponential) space plt.contour(xbins[:-1],10**ybins[:-1],counts.transpose(),mylevels, extent=[xbins.min(),xbins.max(),ybins.min(),ybins.max()], linewidths=3,colors='black',linestyles='solid') plt.yscale('log') plt.show() This produces a plot which looks very similar to the linear one, but with a nice vertical log axis, which is what was intended: contour plot with log axis

    repeatingtimer

    repeatingtimer.py from threading import _Timer class Timer(_Timer): """ See: https://hg.python.org/cpython/file/2.7/Lib/threading.py#l1079 """ def run(self): while not self.finished.is_set(): self.finished.wait(self.interval) self.function(*self.args, **self.kwargs) self.finished.set()

    Python Data Analysis

    A Note About Python Versions

    All examples in this cheat sheet use Python 3. We recommend using the latest stable version of Python, for example, Python 3.8. You can check which version you have installed on your machine by running the following command in the system shell: Sometimes, a development machine will have Python 2 and Python 3 installed side by side. Having two Python versions available is common on macOS. If that is the case for you, you can use the python3 command to run Python 3 even if Python 2 is the default in your environment: If you don’t have Python 3 installed yet, visit the Python Downloads page for instructions on installing it. Launch a Python interpreter by running the python3 command in your shell:

    Libraries and Imports

    The easiest way to install Python modules that are needed for data analysis is to use pip. Installing NumPy and Pandas takes only a few seconds: Once you’ve installed the modules, use the import statement to make the modules available in your program:

    Getting Help With Python Data Analysis Functions

    If you get stuck, the built-in Python docs are a great place to check for tips and ways to solve the problem. The Python help() function displays the help article for a method or a class: The help function uses the system text pagination program, also known as the pager, to display the documentation. Many systems use less as the default text pager, just in case you aren’t familiar with the Vi shortcuts here are the basics: Another useful place to check out for help articles is the online documentation for Python data analysis modules like Pandas and NumPy. For example, the Pandas user guides cover all the Pandas functionality with explanations and examples.

    Basic language features

    A quick tour through the Python basics: There are many more useful string methods in Python, find out more about them in the Python string docs.

    Working with data sources

    Pandas provides a number of easy-to-use data import methods, including CSV and TSV import, copying from the system clipboard, and reading and writing JSON files. This is sufficient for most Python data analysis tasks: Find all other Pandas data import functions in the Pandas docs.

    Working with Pandas Data Frames

    Pandas data frames are a great way to explore, clean, tweak, and filter your data sets while doing data analysis in Python. This section covers a few of the things you can do with your Pandas data frames.

    Exploring data

    Here are a few functions that allow you to easily know more about the data set you are working on:

    Statistical operations

    All standard statistical operations like minimums, maximums, and custom quantiles are present in Pandas:

    Cleaning the Data

    It is quite common to have not-a-number (NaN) values in your data set. To be able to operate on a data set with statistical methods, you’ll first need to clean up the data. The fillna and dropna Pandas functions are a convenient way to replace the NaN values with something more representative for your data set, for example, a zero, or to remove the rows with NaN values from the data frame.

    Filtering and sorting

    Here are some basic commands for filtering and sorting the data in your data frames.

    Machine Learning

    While machine learning algorithms can be incredibly complex, Python’s popular modules make creating a machine learning program straightforward. Below is an example of a simple ML algorithm that uses Python and its data analysis and machine learning modules, namely NumPy, TensorFlow, Keras, and SciKit-Learn. In this program, we generate a sample data set with pizza diameters and their respective prices, train the model on this data set, and then use the model to predict the price of a pizza of a diameter that we choose. Once the model is set up we can use it to predict a result: For more details on the functionality available in Pandas, visit the Pandas user guides. For more powerful math with NumPy (it can be used together with Pandas), check out the NumPy getting started guide.

    pytest 自动化测试

    pytest

    pandas 教程

    有一个用于数据科学的包绝对是必需的,它就是 pandas。 pandas 最有趣的地方在于里面隐藏了很多包。 它是一个核心包,里面有很多其他包的功能。 这点很棒,因为你只需要使用 pandas 就可以完成工作。 pandas 相当于 python 中 excel:它使用表(也就是 dataframe),能在数据上做各种变换,但还有其他很多功能。 如果你早已熟知 python 的使用,可以直接跳到第三段。 让我们开始吧!

    pandas 最基本的功能

    读取数据

    data = pd.read_csv( my_file.csv )data = pd.read_csv( my_file.csv , sep= ; , encoding= latin-1 , nrows=1000, skiprows=[2,5]) sep 代表的是分隔符。 如果你在使用法语数据,excel 中 csv 分隔符是「;」,因此你需要显式地指定它。 编码设置为 latin-1 来读取法语字符。 nrows=1000 表示读取前 1000 行数据。 skiprows=[2,5] 表示你在读取文件的时候会移除第 2 行和第 5 行。 最常用的功能:read_csv, read_excel其他一些很棒的功能:read_clipboard, read_sql 写数据 data.to_csv( my_new_file.csv , index=None) index=None 表示将会以数据本来的样子写入。 如果没有写 index=None,你会多出一个第一列,内容是 1,2,3,...,一直到最后一行。 我通常不会去使用其他的函数,像.to_excel, .to_json, .to_pickle 等等,因为.to_csv 就能很好地完成工作,并且 csv 是最常用的表格保存方式。

    检查数据

    Gives (#rows, #columns) 给出行数和列数。 data.describe() 计算基本的统计数据。

    查看数据

    data.head(3) 打印出数据的前 3 行。 与之类似,.tail() 对应的是数据的最后一行。 data.loc[8] 打印出第八行。 data.loc[8, column_1 ] 打印第八行名为「column_1」的列。 data.loc[range(4,6)] 第四到第六行(左闭右开)的数据子集。

    pandas 的基本函数

    逻辑运算

    data[data[ column_1 ]== french ]data[(data[ column_1 ]== french ) & (data[ year_born ]==1990)]data[(data[ column_1 ]== french ) & (data[ year_born ]==1990) & ~(data[ city ]== London )] 通过逻辑运算来取数据子集。 要使用 & (AND)、 ~ (NOT) 和 | (OR),必须在 逻辑运算前后加上“(”和“)”。 data[data[ column_1 ].isin([ french , english ])] 除了可以在同一列使用多个 OR,你还可以使用.isin() 函数。 基本绘图 matplotlib 包使得这项功能成为可能。 正如我们在介绍中所说,它可以直接在 pandas 中使用。 data[ column_numerical ].plot() .plot() 输出的示例data[ column_numerical ].hist() 画出数据分布(直方图) .hist() 输出的示例%matplotlib inline 如果你在使用 Jupyter,不要忘记在画图之前加上以上代码。

    更新数据

    data.loc[8, column_1 ] = english将第八行名为 column_1 的列替换为「english」 data.loc[data[ column_1 ]== french , column_1 ] = French 在一行代码中改变多列的值。 好了,现在你可以做一些在 excel 中可以轻松访问的事情了。 下面让我们深入研究 excel 中无法实现的一些令人惊奇的操作吧。

    中级函数

    统计出现的次数 data[ column_1 ].value_counts() image .value_counts() 函数输出示例

    在所有的行、列或者全数据上进行操作

    data[ column_1 ].map(len) len() 函数被应用在了「column_1」列中的每一个元素上.map() 运算给一列中的每一个元素应用一个函数data[ column_1 ].map(len).map(lambda x: x/100).plot() pandas 的一个很好的功能就是链式方法(https://tomaugspurger.github.io/method-chaining)。 它可以帮助你在一行中更加简单、高效地执行多个操作(.map() 和.plot())。 data.apply(sum) .apply() 会给一个列应用一个函数。 .applymap() 会给表 (DataFrame) 中的所有单元应用一个函数。

    tqdm, 唯一的

    在处理大规模数据集时,pandas 会花费一些时间来进行.map()、.apply()、.applymap() 等操作。 tqdm 是一个可以用来帮助预测这些操作的执行何时完成的包(是的,我说谎了,我之前说我们只会使用到 pandas)。 from tqdm import tqdm_notebooktqdm_notebook().pandas() 用 pandas 设置 tqdmdata[ column_1 ].progress_map(lambda x: x.count( e )) 用 .progress_map() 代替.map()、.apply() 和.applymap() 也是类似的。 image 在 Jupyter 中使用 tqdm 和 pandas 得到的进度条

    相关性和散射矩阵

    data.corr()data.corr().applymap(lambda x: int(x*100)/100) image .corr() 会给出相关性矩阵 pd.plotting.scatter_matrix(data, figsize=(12,8)) image 散点矩阵的例子。 它在同一幅图中画出了两列的所有组合。

    pandas 中的高级操作

    The SQL 关联

    在 pandas 中实现关联是非常非常简单的 data.merge(other_data, on=[ column_1 , column_2 , column_3 ]) 关联三列只需要一行代码

    分组

    一开始并不是那么简单,你首先需要掌握语法,然后你会发现你一直在使用这个功能。 data.groupby( column_1 )[ column_2 ].apply(sum).reset_index() 按一个列分组,选择另一个列来执行一个函数。 .reset_index() 会将数据重构成一个表。 image 正如前面解释过的,为了优化代码,在一行中将你的函数连接起来。

    行迭代

    dictionary = {}for i,row in data.iterrows(): dictionary[row[ column_1 ]] = row[ column_2 ] .iterrows() 使用两个变量一起循环:行索引和行的数据 (上面的 i 和 row)。 总而言之,pandas 是 python 成为出色的编程语言的原因之一。 我本可以展示更多有趣的 pandas 功能,但是已经写出来的这些足以让人理解为何数据科学家离不开 pandas。 总结一下,pandas 有以下优点:易用,将所有复杂、抽象的计算都隐藏在背后了;直观;快速,即使不是最快的也是非常快的。 它有助于数据科学家快速读取和理解数据,提高其工作效率。 原文链接: https://towardsdatascience.com/be-a-more-efficient-data-scientist-today-master-pandas-with-this-guide-ea362d27386

    Python Beautiful Web Scraping

    https://www.youtube.com/c/KGMIT/playlists Keith Galli https://www.youtube.com/watch?v=GjKQ6V_ViQE Comprehensive Python Beautiful Soup Web Scraping Tutorial! (find/find_all, css select, scrape table) https://github.com/KeithGalli/web-scraping/blob/master/web_scraping_tutorial.ipynb SAMPLE CODE https://www.youtube.com/watch?v=zucvHSQsKHA&t=241s Python Web Scraping - Should I use Selenium, Beautiful Soup or Scrapy? [2020] https://www.digitalocean.com/community/tutorials/how-to-crawl-a-web-page-with-scrapy-and-python-3 How To Crawl A Web Page with Scrapy and Python 3 import requests from bs4 import BeautifulSoup URL = "http://www.values.com/inspirational-quotes" r = requests.get(URL) soup = BeautifulSoup(r.content, 'html5lib') # If this line causes an error, run 'pip install html5lib' or install html5lib print(soup.prettify()) soup = BeautifulSoup(r.content, 'html5lib') URL = "http://www.values.com/inspirational-quotes" r = requests.get(URL) soup = BeautifulSoup(r.content, 'html5lib') quotes=[] # a list to store quotes table = soup.find('div', attrs = {'id':'all_quotes'}) for row in table.findAll('div', attrs = {'class':'col-6 col-lg-3 text-center margin-30px-bottom sm-margin-30px-top'}): quote = {} quote['theme'] = row.h5.text quote['url'] = row.a['href'] quote['img'] = row.img['src'] quote['lines'] = row.img['alt'].split(" #")[0] quote['author'] = row.img['alt'].split(" #")[1] quotes.append(quote) filename = 'inspirational_quotes.csv' with open(filename, 'w', newline='') as f: w = csv.DictWriter(f,['theme','url','img','lines','author']) w.writeheader() for quote in quotes: w.writerow(quote)

    Python能批量抠图的开源神器

    简介 rembg 是使用 Python 实现的用于移除背景图片的工具,要求 Python 3.8 或更高版本,支持批量操作,使用方式比较灵活,可以直接使用命令行、作为服务运行、在 docker 中使用,还可以作为库调用。 下载安装 项目的源码地址是: https://github.com/danielgatis/rembg 要求 Python 3.8 或更高版本,使用以下命令进行安装: pip install rembg 简单使用 1、在命令行中使用 在命令行中可以对服务端图片、本地图片以及整个文件夹图片进行操作,如下: (1)对远程图片进行操作: curl -s http://input.png | rembg > output.png (2)对本地图片进行操作: rembg -o path/to/output.png path/to/input.png (3)对整个文件夹里的图片进行操作: rembg -p path/to/inputs 2、作为服务使用 (1)启动服务 rembg-server (2)如果图片可以直接链接访问,则可以通过浏览器直接打开以下地址操作: http://localhost:5000?url=http://image.png 也可以使用HTML表单通过上传文件的形式进行操作: <form action="http://localhost:5000" method="post" enctype="multipart/form-data"> <input type="file" name="file"/> <input type="submit" value="upload"/> </form> 3、在docker中使用 在docker中使用也很方便,直接运行以下命令: curl -s http://input.png | docker run -i -v ~/.u2net:/root/.u2net danielgatis/rembg:latest > output.png 4、作为库调用 直接在脚本里调用也很简便,先创建app.py,内容如下: import sysfrom rembg.bg import removesys.stdout.buffer.write(remove(sys.stdin.buffer.read())) 再执行命令运行: cat input.png | python app.py > out.png 项目样例使用效果: 高级使用 有些图片可能使用alpha matting模式(即使用-a -ae 15选项)处理效果会更加完美,命令如下: curl -s http://input.png | rembg -a -ae 15 > output.png 效果对比图如下: 结语 remgb简单介绍就到这里了

    Create Proxy In Python

    Import Libraries:
    Get Requests:
    Removing the URL slash
    Sending The Headers
    Using The TCP Server:
    Types Of Proxy Servers:
    Rotating Proxies:
    Uses Of Proxies:
    The Best Proxy for Your Online Tasks:

    Steps:

    Import Libraries:


    SimpleWebSocketServer simple_http_server urllib from simple_websocket_server import WebSocketServer, WebSocket import simple_http_server import urllib PORT = 9097 The SimpleWebSocketServer and the simple_http_server listen to the incoming requests, and the urllib module fetches the target web pages. We can also initialize the port, as shown below.

    Get Requests:


    We define a function do_GET that will be called for all GET requests. class MyProxy(simple_http_server.SimpleHTTPRequestHandler): def do_GET(self): url=self.path[1:] self.send_response(200) self.end_headers() self.copyfile(urllib.urlopen(url), self.wfile)

    Removing the URL slash


    The URL that we pass in the above code will have a slash (/) at the beginning from the browsers. We can remove the slash using the below code. url=self.path[1:]

    Sending The Headers


    We have to send the headers as browsers need them for reporting a successful fetch with the HTTP status code of 200. self.send_response(200) self.end_headers() self.copyfile(urllib.urlopen(url), self.wfile) We used the urllib library in the last line to fetch the URL. We wrote the URL back to the browser using the copyfile function.

    Using The TCP Server:


    We will use the ForkingTCPServer mode and pass it to the above class for interrupt handling. httpd = WebSocketServer.ForkingTCPServer(('', PORT), MyProxy) httpd.serve_forever() You can save your file as ProxyServer.py and run it. Then you can call it from the browser. Your whole code will look like this. from simple_websocket_server import WebSocketServer, WebSocket import simple_http_server import urllib PORT = 9097 MyProxy(simple_http_server.SimpleHTTPRequestHandler): def do_GET(self): url=self.path[1:] self.send_response(200) self.end_headers() self.copyfile(urllib.urlopen(url), self.wfile) httpd = WebSocketServer.ForkingTCPServer(('', PORT), MyProxy) print ("Now serving at" str(PORT)) httpd.serve_forever()

    Types Of Proxy Servers:


    Anonymous Proxy:

    Whenever we type an address on our browser, our device sends a request to the web host of our destination website. When the web host receives the request, it sends the web page of our target website back to our device. The web host only sends the page back to us if it knows our internet protocol, i.e., IP address. Thus, the target website knows the general location from where we are browsing because we sent out our IP address when we requested to browse the website. Most likely, the web host may be able to access our ISP (Internet Service Provider) account name with the help of our IP address.

    Advantages Of Using An Anonymous Proxy

    There are lots of advantages to using an anonymous proxy server. We must be aware of its benefits to understand how it can help us in our organization or any business. Following are some of the pros of using anonymous proxy servers: The most obvious benefit of anonymous proxy servers is that they give us some semblance of privacy. It essentially substitutes its IP address in place of ours and allows us to bypass geo-blocking. For instance, a video streaming website provides access to viewers of specific countries and blocks requests from other countries. We can bypass this restriction by connecting to a proxy server in any country to access the video streaming website. Public WiFi may prevent us from browsing certain websites at some universities or offices. We can get around this browsing restriction by using a proxy server. An anonymous proxy server helps clients protect their vital information from hacking. A proxy server is often used to access data, speeding up browsing because of its good cache system.

    Rotating Proxies:


    We can define proxy rotation as a feature that changes our IP address with every new request we send. When we visit a website, we send a request that shows a destination server a lot of data, including our IP address. For instance, we send many such requests when we gather data using a scraper( for generating leads). So, the destination server gets suspicious and bans it when most requests come from the same IP. Therefore, there must be a solution to change our IP address with each request we send. That solution is a rotating proxy. So, to avoid the needless hassle of getting a scraper for rotating IPs in web scraping, we can get rotating proxies and let our provider take care of the rotation.

    Uses Of Proxies:


    Web Scraping E-commerce websites employ anti-scraping tools for monitoring IP addresses to detect those making multiple web requests. It is where the use of proxies comes in. They enable users to make several requests that have ordinarily been detected from different IP addresses. Each web request is assigned a different IP address. In this way, the webserver is tricked and thinks that all the web requests come from other devices. Ad Verification Ad verification allows advertisers to check if their ads are displayed on the right websites and seen by the right audiences. The constant change of IP addresses accesses many different websites and thus verifies ads without IP blocks. Accessing geo-restricted websites and data The same content can look different or unavailable when accessed from specific locations. The proxies allow us to access the necessary data regardless of geo-location.

    The Best Proxy for Your Online Tasks:


    ProxyScrape is one of the most popular and reliable proxy providers online. Three proxy services include dedicated datacentre proxy servers, residential proxy servers, and premium proxy servers. So, what is the best possible solution for a best alternate solution for how to create a proxy in python? Before answering that questions, it is best to see the features of each proxy server. A dedicated datacenter proxy is best suited for high-speed online tasks, such as streaming large amounts of data (in terms of size) from various servers for analysis purposes. It is one of the main reasons organizations choose dedicated proxies for transmitting large amounts of data in a short amount of time. A dedicated datacenter proxy has several features, such as unlimited bandwidth and concurrent connections, dedicated HTTP proxies for easy communication, and IP authentication for more security. With 99.9% uptime, you can rest assured that the dedicated datacenter will always work during any session. Last but not least, ProxyScrape provides excellent customer service and will help you to resolve your issue within 24-48 business hours. Next is a residential proxy. Residential is a go-to proxy for every general consumer. The main reason is that the IP address of a residential proxy resembles the IP address provided by ISP. This means getting permission from the target server to access its data will be easier than usual. The other feature of ProxyScrape’s residential proxy is a rotating feature. A rotating proxy helps you avoid a permanent ban on your account because your residential proxy dynamically changes your IP address, making it difficult for the target server to check whether you are using a proxy or not. Apart from that, the other features of a residential proxy are: unlimited bandwidth, along with concurrent connection, dedicated HTTP/s proxies, proxies at any time session because of 7 million plus proxies in the proxy pool, username and password authentication for more security, and last but not least, the ability to change the country server. You can select your desired server by appending the country code to the username authentication. The last one is the premium proxy. Premium proxies are the same as dedicated datacenter proxies. The functionality remains the same. The main difference is accessibility. In premium proxies, the proxy list (the list that contains proxies) is made available to every user on ProxyScrape’s network. That is why premium proxies cost less than dedicated datacenter proxies. So, what is the best possible solution for the best alternate solution for how to create a proxy in python? The answer would be “residential proxy” and "dedicated datacenter proxy" The reason is simple. As said above, the residential proxy is a rotating proxy, meaning that your IP address would be dynamically changed over a period of time which can be helpful to trick the server by sending a lot of requests within a small time frame without getting an IP block. Next, the best thing would be to change the proxy server based on the country. You just have to append the country ISO_CODE at the end of the IP authentication or username and password authentication. Datacenter proxy is blazing fast, and if you are an avid movie buff, then a datacenter proxy is the best companion to stream high-quality videos.

    Python Creating Proxy Webserver

    1.Creating an incoming socket
    2.Accept client and process
    3. Redirecting the traffic
    How to test the server?
    features are added
    Add blacklisting of domains.
    To add host blocking:
    Using regex to match correct IP addresses:
    Import module and setup its initial configuration.
    Create a new module, ColorizePython.py
    https://www.geeksforgeeks.org/creating-a-proxy-webserver-in-python-set-1/ Socket programming in python is very user friendly as compared to c. The programmer need not worry about minute details regarding sockets. In python, the user has more chance of focusing on the application layer rather than the network layer. We would be developing a simple multi-threaded proxy server capable of handling HTTP traffic. This is a naive implementation of a proxy server. To begin with, we would achieve the process in 3 easy steps

    1.Creating an incoming socket


    We create a socket serverSocket in the __init__ method of the Server Class. This creates a socket for the incoming connections. We then bind the socket and then wait for the clients to connect. def __init__(self, config): # Shutdown on Ctrl+C signal.signal(signal.SIGINT, self.shutdown) # Create a TCP socket self.serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Re-use the socket self.serverSocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # bind the socket to a public host, and a port self.serverSocket.bind((config['HOST_NAME'], config['BIND_PORT'])) self.serverSocket.listen(10) # become a server socket self.__clients = {}

    2.Accept client and process


    This is the easiest yet the most important of all the steps. We wait for the client’s connection request and once a successful connection is made, we dispatch the request in a separate thread, making ourselves available for the next request. This allows us to handle multiple requests simultaneously which boosts the performance of the server multifold times. while True: # Establish the connection (clientSocket, client_address) = self.serverSocket.accept() d = threading.Thread(name=self._getClientName(client_address), target = self.proxy_thread, args=(clientSocket, client_address)) d.setDaemon(True) d.start()

    3. Redirecting the traffic


    The main feature of a proxy server is to act as an intermediate between source and destination. Here, we would be fetching data from source and then pass it to the client. First, we extract the URL from the received request data. # get the request from browser request = conn.recv(config['MAX_REQUEST_LEN']) # parse the first line first_line = request.split('\n')[0] # get url url = first_line.split(' ')[1] Then, we find the destination address of the request. Address is a tuple of (destination_ip_address, destination_port_no). We will be receiving data from this address. http_pos = url.find("://") # find pos of :// if (http_pos==-1): temp = url else: temp = url[(http_pos+3):] # get the rest of url port_pos = temp.find(":") # find the port pos (if any) # find end of web server webserver_pos = temp.find("/") if webserver_pos == -1: webserver_pos = len(temp) webserver = "" port = -1 if (port_pos==-1 or webserver_pos < port_pos): # default port port = 80 webserver = temp[:webserver_pos] else: # specific port port = int((temp[(port_pos+1):])[:webserver_pos-port_pos-1]) webserver = temp[:port_pos] Now, we setup a new connection to the destination server (or remote server), and then send a copy of the original request to the server. The server will then respond with a response. All the response messages use the generic message format of RFC 822. s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.settimeout(config['CONNECTION_TIMEOUT']) s.connect((webserver, port)) s.sendall(request) We then redirect the server’s response to the client. conn is the original connection to the client. The response may be bigger than MAX_REQUEST_LEN that we are receiving in one call, so, a null response marks the end of the response. while 1: # receive data from web server data = s.recv(config['MAX_REQUEST_LEN']) if (len(data) > 0): conn.send(data) # send to browser/client else: break We then close the server connections appropriately and do the error handling to make sure the server works as expected.

    How to test the server?


    1. Run the server on a terminal. Keep it running and switch to your favorite browser. 2. Go to your browser’s proxy settings and change the proxy server to ‘localhost’ and port to ‘12345’. 3. Now open any HTTP website (not HTTPS), for eg. geeksforgeeks.org and volla !! you should be able to access the content on the browser. Once the server is running, we can monitor the requests coming to the client. We can use that data to monitor the content that is going or we can develop statistics based on the content. We can even restrict access to a website or blacklist an IP address. We would be dealing with more such features in the upcoming tutorials. What next? We would be adding the following features in our proxy server in the upcoming tutorials. – Blacklisting Domains – Content monitoring – Logging – HTTP WebServer + ProxyServer The whole working source code of this tutorial is available here Creating a Proxy Webserver in Python | Set 2 If you have any questions/comments then feel free to post them in the comments section.

    features are added


    A few interesting features are added to make it more useful.

    Add blacklisting of domains.


    For Ex. google.com, facebook.com. Create a list of BLACKLIST_DOMAINS in our configuration dict. For now, just ignore/drop the requests received for blacklisted domains. (Ideally, we must respond with a forbidden response.) # Check if the host:port is blacklisted for i in range(0, len(config['BLACKLIST_DOMAINS'])): if config['BLACKLIST_DOMAINS'][i] in url: conn.close() return

    To add host blocking:


    Say, you may need to allow connections from a particular subnet or connection for a particular person. To add this, create a list of all the allowed hosts. Since the hosts can be a subnet as well, add regex for matching the IP addresses, specifically IPV4 addresses. “ IPv4 addresses are canonically represented in dot-decimal notation, which consists of four decimal numbers, each ranging from 0 to 255, separated by dots, e.g., 172.16.254.1. Each part represents a group of 8 bits (octet) of the address.”

    Using regex to match correct IP addresses:


    Create a new method,

    _ishostAllowed in Server class, and use the fnmatch module to match regexes. Iterate through all the regexes and allow request if it matches any of them. If a client address is not found to be a part of any regex, then send a FORBIDDEN response. Again, for now, skip this response creation part. Note: We would be creating a full-fledged custom webserver in upcoming tutorials, their creation of a createResponse function will be done to handle the generic response creation. def _ishostAllowed(self, host): """ Check if host is allowed to access the content """ for wildcard in config['HOST_ALLOWED']: if fnmatch.fnmatch(host, wildcard): return True return False Default host match regex would be ‘*’ to match all the hosts. Though, regex of the form ‘192.168.*’ can also be used. The server currently processes requests but does not show any messages, so we are not aware of the state of the server. Its messages should be logged onto the console. For this purpose, use the logging module as it is thread-safe. (server is multi-threaded if you remember.)

    Import module and setup its initial configuration.


    logging.basicConfig(level = logging.DEBUG, format = '[%(CurrentTime)-10s] (%(ThreadName)-10s) %(message)s',)

    Create a separate method that logs every message:

    Pass it as an argument, with additional data such as thread-name and current-time to keep track of the logs. Also, create a function that colorizes the logs so that they look pretty on STDOUT. To achieve this, add a boolean in configuration, COLORED_LOGGING, and create a new function that colorizes every msg passed to it based on the LOG_LEVEL. def log(self, log_level, client, msg): """ Log the messages to appropriate place """ LoggerDict = { 'CurrentTime' : strftime("%a, %d %b %Y %X", localtime()), 'ThreadName' : threading.currentThread().getName() } if client == -1: # Main Thread formatedMSG = msg else: # Child threads or Request Threads formatedMSG = '{0}:{1} {2}'.format(client[0], client[1], msg) logging.debug('%s', utils.colorizeLog(config['COLORED_LOGGING'], log_level, formatedMSG), extra=LoggerDict)

    Create a new module, ColorizePython.py


    It contains a pycolors class that maintains a list of color codes. Separate this into another module in order to make code modular and to follow PEP8 standards. # ColorizePython.py class pycolors: HEADER = '\033[95m' OKBLUE = '\033[94m' OKGREEN = '\033[92m' WARNING = '\033[93m' FAIL = '\033[91m' ENDC = '\033[0m' # End color BOLD = '\033[1m' UNDERLINE = '\033[4m'

    Module:

    import ColorizePython

    Method:

    def colorizeLog(shouldColorize, log_level, msg): ## Higher is the log_level in the log() ## argument, the lower is its priority. colorize_log = { "NORMAL": ColorizePython.pycolors.ENDC, "WARNING": ColorizePython.pycolors.WARNING, "SUCCESS": ColorizePython.pycolors.OKGREEN, "FAIL": ColorizePython.pycolors.FAIL, "RESET": ColorizePython.pycolors.ENDC } if shouldColorize.lower() == "true": if log_level in colorize_log: return colorize_log[str(log_level)] + msg + colorize_log['RESET'] return colorize_log["NORMAL"] + msg + colorize_log["RESET"] return msg Since the colorizeLog is not a function of a server-class, it is created as a separate module named utils.py which stores all the utility that makes code easier to understand and put this method there. Add appropriate log messages wherever required, especially whenever the state of the server changes. Modify the shutdown method in the server to exit all the running threads before exiting the application. threading.enumerate() iterates over all the running threads, so we do not need to maintain a list of them. The behavior of the threading module is unexpected when we try to end the main_thread. The official documentation also states this: “join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raises the same exception.” So, skip it appropriately. Here’s the code for the same. def shutdown(self, signum, frame): """ Handle the exiting server. Clean all traces """ self.log("WARNING", -1, 'Shutting down gracefully...') main_thread = threading.currentThread() # Wait for all clients to exit for t in threading.enumerate(): if t is main_thread: continue self.log("FAIL", -1, 'joining ' + t.getName()) t.join() self.serverSocket.close() sys.exit(0) Build simple proxy server in Python Build Simple proxy in Python in just 17 lines of code

    OpenCV Python Tutorial

    OpenCV Python Tutorial import cv2 img = cv2.imread('assets/logo.jpg', 1) img = cv2.resize(img, (0, 0), fx=0.5, fy=0.5) img = cv2.rotate(img, cv2.cv2.ROTATE_90_CLOCKWISE) cv2.imwrite('new_img.jpg', img) cv2.imshow('Image', img) cv2.waitKey(0) cv2.destroyAllWindows() import cv2 import random img = cv2.imread('assets/logo.jpg', -1) # Change first 100 rows to random pixels for i in range(100): for j in range(img.shape[1]): img[i][j] = [random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)] # Copy part of image tag = img[500:700, 600:900] img[100:300, 650:950] = tag cv2.imshow('Image', img) cv2.waitKey(0) cv2.destroyAllWindows() import numpy as np import cv2 cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() width = int(cap.get(3)) height = int(cap.get(4)) image = np.zeros(frame.shape, np.uint8) smaller_frame = cv2.resize(frame, (0, 0), fx=0.5, fy=0.5) image[:height//2, :width//2] = cv2.rotate(smaller_frame, cv2.cv2.ROTATE_180) image[height//2:, :width//2] = smaller_frame image[:height//2, width//2:] = cv2.rotate(smaller_frame, cv2.cv2.ROTATE_180) image[height//2:, width//2:] = smaller_frame cv2.imshow('frame', image) if cv2.waitKey(1) == ord('q'): break cap.release() cv2.destroyAllWindows() import numpy as np import cv2 cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() width = int(cap.get(3)) height = int(cap.get(4)) img = cv2.line(frame, (0, 0), (width, height), (255, 0, 0), 10) img = cv2.line(img, (0, height), (width, 0), (0, 255, 0), 5) img = cv2.rectangle(img, (100, 100), (200, 200), (128, 128, 128), 5) img = cv2.circle(img, (300, 300), 60, (0, 0, 255), -1) font = cv2.FONT_HERSHEY_SIMPLEX img = cv2.putText(img, 'Tim is Great!', (10, height - 10), font, 2, (0, 0, 0), 5, cv2.LINE_AA) cv2.imshow('frame', img) if cv2.waitKey(1) == ord('q'): break cap.release() cv2.destroyAllWindows() import numpy as np import cv2 cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() width = int(cap.get(3)) height = int(cap.get(4)) hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) lower_blue = np.array([90, 50, 50]) upper_blue = np.array([130, 255, 255]) mask = cv2.inRange(hsv, lower_blue, upper_blue) result = cv2.bitwise_and(frame, frame, mask=mask) cv2.imshow('frame', result) cv2.imshow('mask', mask) if cv2.waitKey(1) == ord('q'): break cap.release() cv2.destroyAllWindows() import numpy as np import cv2 img = cv2.imread('assets/chessboard.png') img = cv2.resize(img, (0, 0), fx=0.75, fy=0.75) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) corners = cv2.goodFeaturesToTrack(gray, 100, 0.01, 10) corners = np.int0(corners) for corner in corners: x, y = corner.ravel() cv2.circle(img, (x, y), 5, (255, 0, 0), -1) for i in range(len(corners)): for j in range(i + 1, len(corners)): corner1 = tuple(corners[i][0]) corner2 = tuple(corners[j][0]) color = tuple(map(lambda x: int(x), np.random.randint(0, 255, size=3))) cv2.line(img, corner1, corner2, color, 1) cv2.imshow('Frame', img) cv2.waitKey(0) cv2.destroyAllWindows() import numpy as np import cv2 img = cv2.resize(cv2.imread('assets/soccer_practice.jpg', 0), (0, 0), fx=0.8, fy=0.8) template = cv2.resize(cv2.imread('assets/shoe.PNG', 0), (0, 0), fx=0.8, fy=0.8) h, w = template.shape methods = [cv2.TM_CCOEFF, cv2.TM_CCOEFF_NORMED, cv2.TM_CCORR, cv2.TM_CCORR_NORMED, cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED] for method in methods: img2 = img.copy() result = cv2.matchTemplate(img2, template, method) min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result) if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]: location = min_loc else: location = max_loc bottom_right = (location[0] + w, location[1] + h) cv2.rectangle(img2, location, bottom_right, 255, 5) cv2.imshow('Match', img2) cv2.waitKey(0) cv2.destroyAllWindows() import numpy as np import cv2 cap = cv2.VideoCapture(0) face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml') eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml') while True: ret, frame = cap.read() gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, 1.3, 5) for (x, y, w, h) in faces: cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 5) roi_gray = gray[y:y+w, x:x+w] roi_color = frame[y:y+h, x:x+w] eyes = eye_cascade.detectMultiScale(roi_gray, 1.3, 5) for (ex, ey, ew, eh) in eyes: cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 255, 0), 5) cv2.imshow('frame', frame) if cv2.waitKey(1) == ord('q'): break cap.release() cv2.destroyAllWindows() import numpy as np import cv2 cap = cv2.VideoCapture(0) face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml') eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml') while True: ret, frame = cap.read() gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, 1.3, 5) for (x, y, w, h) in faces: cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 5) roi_gray = gray[y:y+w, x:x+w] roi_color = frame[y:y+h, x:x+w] eyes = eye_cascade.detectMultiScale(roi_gray, 1.3, 5) for (ex, ey, ew, eh) in eyes: cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 255, 0), 5) cv2.imshow('frame', frame) if cv2.waitKey(1) == ord('q'): break cap.release() cv2.destroyAllWindows() OpenCV-Python Tutorials Values for OpenCV detectMultiScale() parameters

    Python3 find the circle center from 3 pts

    from math import sqrt def findCircle(x1, y1, x2, y2, x3, y3) : x12 = x1 - x2; x13 = x1 - x3; y12 = y1 - y2; y13 = y1 - y3; y31 = y3 - y1; y21 = y2 - y1; x31 = x3 - x1; x21 = x2 - x1; # x1^2 - x3^2 sx13 = pow(x1, 2) - pow(x3, 2); # y1^2 - y3^2 sy13 = pow(y1, 2) - pow(y3, 2); sx21 = pow(x2, 2) - pow(x1, 2); sy21 = pow(y2, 2) - pow(y1, 2); f = (((sx13) * (x12) + (sy13) * (x12) + (sx21) * (x13) + (sy21) * (x13)) // (2 * ((y31) * (x12) - (y21) * (x13)))); g = (((sx13) * (y12) + (sy13) * (y12) + (sx21) * (y13) + (sy21) * (y13)) // (2 * ((x31) * (y12) - (x21) * (y13)))); c = (-pow(x1, 2) - pow(y1, 2) - 2 * g * x1 - 2 * f * y1); # eqn of circle be x^2 + y^2 + 2*g*x + 2*f*y + c = 0 # where centre is (h = -g, k = -f) and # radius r as r^2 = h^2 + k^2 - c h = -g; k = -f; sqr_of_r = h * h + k * k - c; # r is the radius r = round(sqrt(sqr_of_r), 5); print("Centre = (", h, ", ", k, ")"); print("Radius = ", r); # Driver code if __name__ == "__main__" : x1 = 1 ; y1 = 1; x2 = 2 ; y2 = 4; x3 = 5 ; y3 = 3; findCircle(x1, y1, x2, y2, x3, y3);

    Finding the “center of gravity” of multiple points

    where points have unequal weights import math import nummpy import math def toCartesian(t): latD,longD = t latR = math.radians(latD) longR = math.radians(longD) return ( math.cos(latR)*math.cos(longR), math.cos(latR)*math.sin(longR), math.sin(latR) ) def toSpherical(t): x,y,z = t r = math.hypot(x,y) if r == 0: if z > 0: return (90,0) elif z< 0: return (-90,0) else: return None else: return (math.degrees(math.atan2(z, r)), math.degrees(math.atan2(y,x))) xyz = numpy.asarray([0.0,0.0,0.0]) total = 0 for p in points: weight = p["weight"] total += weight xyz += numpy.asarray(toCartesian((p["lat"],p["long"])))*weight avgXYZ = xyz/total avgLat, avgLong = toSpherical(avgXYZ) print avgLat,avgLong

    django find center of points

    https://stackoverflow.com/questions/6671183/calculate-the-center-point-of-multiple-latitude-longitude-coordinate-pairs from django.contrib.gis.geos import Point, MultiPoint points = [ Point((145.137075, -37.639981)), Point((144.137075, -39.639981)), ] multipoint = MultiPoint(*points) point = multipoint.centroid

    myro

    Interactive Graphics in Python myro Python Graphics Programming

    Python 提取 PDF

    简单文本类型数据 import pdfplumber as pr import pandas as pd pdf = pr.open('关于使用自有资金购买银行理财产品的进展公告.PDF') ps = pdf.pages pg = ps[3] tables = pg.extract_tables() table = tables[0] print(table) df = pd.DataFrame(table[1:],columns = table[0]) for i in range(len(table)): for j in range(len(table[i])): table[i][j] = table[i][j].replace('\n','') df1 = pd.DataFrame(table[1:],columns = table[0]) df1.to_excel('page2.xlsx') 复杂型表格提取 import pdfplumber as pr import pandas as pd pdf = pr.open('关于使用自有资金购买银行理财产品的进展公告.PDF') ps = pdf.pages pg = ps[4] tables = pg.extract_tables() table = tables[0] print(table) df = pd.DataFrame(table[1:],columns = table[0]) for i in range(len(table)): for j in range(len(table[i])): table[i][j] = table[i][j].replace('\n','') df1 = pd.DataFrame(table[1:],columns = table[0]) df2 = df1.iloc[2:,:] df2 = df2.rename(columns = {"2019年12月31日":"2019年1-12月","2020年9月30日":"2020年1-9月"}) df2 = df2.loc[3:,:] df1 = df1.loc[:1,:] with pd.ExcelWriter('公司影响.xlsx') as i: df1.to_excel(i,sheet_name='资产', index=False, header=True) #放入资产数据 df2.to_excel(i,sheet_name='营业',index=False, header=True) #放入营业数据 图片型表格提取 pip install pytesseract http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe import pytesseract from PIL import Image import pandas as pd pytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe' tiqu = pytesseract.image_to_string(Image.open('图片型.jpg')) print(tiqu) tiqu = tiqu.split('\n') while '' in tiqu: #不能使用for tiqu.remove('') first = tiqu[:6] second = tiqu[6:12] third = tiqu[12:] df = pd.DataFrame() df[first[0]] = first[1:] df[second[0]] = second[1:] df[third[0]] = third[1:] #df.to_excel('图片型表格.xlsx') #转为xlsx文件 我们的思路是用Tesseract-OCR来解析图片,得到一个字符串,接着对字符串运用split函数,把字符串变成列表同时删除\n。

    encrypt and decrypt a string in python

    USE cryptography.fernet.Fernet Initialize a cryptographic key by calling cryptography.fernet.Fernet.generate_key(). Configure the encryption type to symmetric encryption by calling the function cryptography.fernet.Fernet(key) with the cryptographic key from step 1 as key. Encrypt the string by calling cryptography.fernet.Fernet.encrypt(data) with data as the byte representation of a string. Decrypt an encrypted string by using the key generated from step 1 and the encryption scheme from step 2. Call cryptography.fernet.Fernet.decrypt(token) with the encrypted message as token to get the original message. key = Fernet.generate_key() encryption_type = Fernet(key) encrypted_message = encryption_type.encrypt(b"Hello World") encode message print(encrypted_message) OUTPUT b'gAAAAABefl-Ur385W0q0YNZM7rbUL_ImiFKBI05hEMIqhgf4FeUKyZFDUzIi3tqnCt6N4mAR2o8-ryPOOyJH32bvZEVjAG-YLg==' decrypted_message = encryption_type.decrypt(encrypted_message)

    Load a file into the python console

    From the shell command line: python file.py From the Python command line import file or from file import *

    print colored text to the terminal

    # install the Python termcolor module from termcolor import colored in Python 3: print(colored('hello', 'red'), colored('world', 'green'))

    1、合并两个字典

    Python3.5之后,合并字典变得容易起来。 我们可以通过**符号解压字典,并将多个字典传入{}中,实现合并。 def Merge(dict1, dict2): res = {**dict1, **dict2} return res # 两个字典 dict1 = {"name": "Joy", "age": 25} dict2 = {"name": "Joy", "city": "New York"} dict3 = Merge(dict1, dict2) print(dict3) 输出: {'name': 'Joy', 'age': 25, 'city': 'New York'}

    2、链式比较

    python有链式比较的机制,在一行里支持多种运算符比较。 相当于拆分多个逻辑表达式,再进行逻辑与操作。 a = 5 print(2 < a < 8) print(1 == a < 3) 输出: TrueFalse

    3、重复打印字符串

    将一个字符串重复打印多次,一般使用循环实现,但有更简易的方式可以实现。 n = 5string = "Hello!"print(string * n) 输出: Hello!Hello!Hello!Hello!Hello!

    4、检查文件是否存在

    我们知道Python有专门处理系统交互的模块-os,它可以处理文件的各种增删改查操作。 那如何检查一个文件是否存在呢?os模块可以轻松实现。 from os import path def check_for_file(): print("Does file exist:", path.exists("data.csv")) if __name__=="__main__": check_for_file() 输出: Does file exist: False

    5、检索列表最后一个元素

    在使用列表的时候,有时会需要取最后一个元素,有下面几种方式可以实现。 my_list = ['banana', 'apple', 'orange', 'pineapple'] #索引方法last_element = my_list[-1] #pop方法last_element = my_list.pop() 输出: 'pineapple'

    6、列表推导式

    列表推导式是for循环的简易形式,可以在一行代码里创建一个新列表,同时能通过if语句进行判断筛选 def get_vowels(string): return [vowel for vowel in string if vowel in 'aeiou'] print("Vowels are:", get_vowels('This is some random string')) 输出: Vowels are: ['i', 'i', 'o', 'e', 'a', 'o', 'i']

    7、计算代码执行时间

    python中time模块提供了时间处理相关的各种函数方法,我们可以使用它来计算代码执行的时间。 import time start_time = time.time() total = 0 for i in range(10): total += i print("Sum:", total) end_time = time.time() time_taken = end_time - start_time print("Time: ", time_taken) 输出: Sum: 45Time: 0.0009975433349609375

    8、查找出现次数最多的元素

    使用max方法找出列表中出现次数最多的元素。 def most_frequent(list): return max(set(list), key=list.count) mylist = [1,1,2,3,4,5,6,6,2,2] print("出现次数最多的元素是:", most_frequent(mylist)) 输出: 出现次数最多的元素是: 2

    9、将两个列表转换为字典

    有两个列表,将列表A里的元素作为键,将列表B里的对应元素作为值,组成一个字典。 def list_to_dictionary(keys, values): return dict(zip(keys, values)) list1 = [1, 2, 3] list2 = ['one', 'two', 'three'] print(list_to_dictionary(list1, list2)) 输出: {1: 'one', 2: 'two', 3: 'three'}

    10、异常处理

    Python提供了try...except...finally的方式来处理代码异常,当然还有其他组合的方式。 a, b = 1,0 try: print(a/b) except ZeroDivisionError: print("Can not divide by zero") finally: print("Executing finally block") 输出: Can not divide by zeroExecuting finally block

    11、反转字符串

    使用切片操作对字符串进行反转,这是比较直接有效的方式。 这也可以用来检测回文数。 str = "Hello World" print("反转后字符串是:", str[::-1]) 输出: 反转后字符串是: dlroW olleH

    12、字符串列表组成单个字符串

    使用join方法将字符串列表组成单个字符串。 list = ["Hello", "world", "Ok", "Bye!"] combined_string = " ".join(list) print(combined_string) 输出: Hello world Ok Bye!

    13、返回字典缺失键的默认值

    字典中的get方法用于返回指定键的值,如果键不在字典中返回默认值 None 或者设置的默认值。 dict = {1:'one', 2:'two', 4:'four'} #returning three as default value print(dict.get(3, 'three')) print("原始字典:", dict) 输出: three原始字典: {1: 'one', 2: 'two', 4: 'four'}

    14、交换两个变量的值

    在不使用临时变量的前提下,交换两个变量的值。 a, b = 5, 10 # 方法1 a, b = b, a # 方法2 def swap(a,b): return b,a swap(a,b)

    15、正则表达式

    正则表达式用来匹配处理字符串,python中的re模块提供了全部的正则功能。 import re text = "The rain in spain" result = re.search("rain", text) print(True if result else False) 输出: True

    16、筛选值

    python中的filter方法可以用来进行值的筛选。 my_list = [0,1,2,3,6,7,9,11] result = filter(lambda x: x % 2!=0, my_list) print(list(result)) 输出: [1, 3, 7, 9, 11]

    17、统计字频

    判断字符串每个元素出现的次数,可以用collections模块中的Counter方法来实现,非常简洁。 from collections import Counter result = Counter('banana') print(result) 输出: Counter({'a': 3, 'n': 2, 'b': 1})

    18、变量的内存占用

    如何输出python中变量的内存占用大小,可以通过sys模块来实现。 import sys var1 = 15list1 = [1,2,3,4,5] print(sys.getsizeof(var1)) print(sys.getsizeof(list1)) 输出: 28 104

    19、链式函数调用

    在一行代码中调用多个函数。 def add(a, b): return a + b def subtract(a, b): return a - b a, b = 5, 10 print((add if b > a else subtract)(a,b)) 输出: 15

    20、从列表中删除重复项

    删除列表中重复项一般可以通过遍历来筛选去重,或者直接使用集合方法。 list1 = [1,2,3,3,4,'John', 'Ana', 'Mark', 'John'] # 方法1 def remove_duplicate(list_value): return list(set(list_value)) print(remove_duplicate(list1)) # 方法2 result = [] [result.append(x) for x in list1 if x not in result] print(result) 输出: [1, 2, 3, 4, 'Ana', 'John', 'Mark'] [1, 2, 3, 4, 'John', 'Ana', 'Mark']

    Linear Regression Machine Learning example

    """ Linear Regression Machine Learning example: ### Uses data for machine age and time between failures ### ### Predict a model for the data, supervised ML #### https://www.youtube.com/watch?v=2BusGJyn77E """ ## Import packages import tensorflow as tf import numpy import pandas as pd import matplotlib.pyplot as plt rng = numpy.random #Define your spreadsheet spreadsheet = 'LR_ML.xlsx' data = pd.read_excel(spreadsheet) #Define your useful columns of data months = data['Machine Age (Months)'].values MTBF = data['Mean Time Between Failure (Days)'].values # HyperParameters learning_rate = 0.02 training_epochs = 3000 #Parameter display_step = 50 # Training Data (X,Y) Sets train_X = numpy.asarray(months) train_Y = numpy.asarray(MTBF) #Specifying the length of the train_x data n_samples = train_X.shape[0] # tf Graph Input --- Setting the dtype for the placeholder information X = tf.placeholder("float") Y = tf.placeholder("float") # Set model weights This is initializing the guesses of the model for weight and bias W = tf.Variable(rng.randn(), name="weight") b = tf.Variable(rng.randn(), name="bias") # Construct a linear model (y=WX+b) pred = tf.add(tf.multiply(X, W), b) # Mean squared error This is the error in the calculation to try to minimize error = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples) # Gradient descent # Note, minimize() knows to modify W and b because Variable objects are trainable=True by default optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(error) # Initialize the variables (i.e. assign their default value) init = tf.global_variables_initializer() # Start training with tf.Session() as sess: # Run the initializer sess.run(init) # Fit all training data for epoch in range(training_epochs): for (x, y) in zip(train_X, train_Y): sess.run(optimizer, feed_dict={X: x, Y: y}) # Display logs per epoch step if (epoch+1) % display_step == 0: c = sess.run(error, feed_dict={X: train_X, Y:train_Y}) print("Epoch:", '%04d' % (epoch+1), "error=", "{:.9f}".format(c), \ "W=", sess.run(W), "b=", sess.run(b)) print("Optimization Finished!") training_error = sess.run(error, feed_dict={X: train_X, Y: train_Y}) print("Training error=", training_error, "W=", sess.run(W), "b=", sess.run(b), '\n') # Graphic display plt.plot(train_X, train_Y, 'ro', label='Original data') plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line') plt.legend() plt.show() # Testing example, as requested (Issue #2) test_X = numpy.asarray([2,4,6,8,10]) test_Y = numpy.asarray([25,23,21,19,17]) print("Testing... (Mean square loss Comparison)") testing_error = sess.run( tf.reduce_sum(tf.pow(pred - Y, 2)) / (2 * test_X.shape[0]), feed_dict={X: test_X, Y: test_Y}) # same function as cost above print("Testing error=", testing_error) print("Absolute mean square loss difference:", abs( training_error - testing_error)) plt.plot(test_X, test_Y, 'bo', label='Testing data') plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line') plt.legend() plt.show()

    python stock market realtime monitoring

    Alpha vantage website: https://www.alphavantage.co/ Full code from the video: https://github.com/Derrick-Sherrill/DerrickSherrill.com/blob/master/stocks.py stocks.py import pandas as pd from alpha_vantage.timeseries import TimeSeries import time api_key = 'RNZPXZ6Q9FEFMEHM' ts = TimeSeries(key=api_key, output_format='pandas') data, meta_data = ts.get_intraday(symbol='MSFT', interval = '1min', outputsize = 'full') print(data) i = 1 #while i==1: # data, meta_data = ts.get_intraday(symbol='MSFT', interval = '1min', outputsize = 'full') # data.to_excel("output.xlsx") # time.sleep(60) close_data = data['4. close'] percentage_change = close_data.pct_change() print(percentage_change) last_change = percentage_change[-1] if abs(last_change) > 0.0004: print("MSFT Alert:" + str(last_change))

    python file server

    python -m http.server 8000 ip on hp 192.168.128.93:8000 ip on acer 192.168.128.77:8000

    python ftp server

    One line ftp server in python Twisted is an event-driven networking engine written in Python pip install twisted code: from twisted.protocols.ftp import FTPFactory, FTPRealm from twisted.cred.portal import Portal from twisted.cred.checkers import AllowAnonymousAccess, FilePasswordDB from twisted.internet import reactor reactor.listenTCP(21, FTPFactory(Portal(FTPRealm('./'), [AllowAnonymousAccess()]))) reactor.run()

    pyftpdlib

    pyftpdlib is one of the very best ftp servers out there for python. pip3 install pyftpdlib python -m pyftpdlib code: from pyftpdlib import servers from pyftpdlib.handlers import FTPHandler address = ("0.0.0.0", 21) # listen on every IP on my machine on port 21 server = servers.FTPServer(address, FTPHandler) server.serve_forever() To get a list of command line options: python3 -m pyftpdlib --help To setup port 21 and writable python -m pyftpdlib -p 21 -w Usage: python -m pyftpdlib [options] Start a stand alone anonymous FTP server. Options: -h, --help. show this help message and exit -i ADDRESS, --interface=ADDRESS. specify the interface to run on (default all interfaces) -p PORT, --port=PORT. specify port number to run on (default 2121) -w, --write. grants write access for logged in user (default read-only) -d FOLDER, --directory=FOLDER. specify the directory to share (default current directory) -n ADDRESS, --nat-address=ADDRESS. the NAT address to use for passive connections -r FROM-TO, --range=FROM-TO. the range of TCP ports to use for passive connections (e.g. -r 8000-9000) -D, --debug. enable DEBUG logging evel -v, --version. print pyftpdlib version and exit -V, --verbose. activate a more verbose logging -u USERNAME, --username=USERNAME. specify username to login with (anonymous login will be disabled and password required if supplied) -P PASSWORD, --password=PASSWORD. specify a password to login with (username required to be useful)

    enable FTP through Chrome on all Windows devices

    In Chrome 81, FTP support is disabled by default, but you can enable it using the # enable-ftp flag. Open Chrome and type “chrome://flags” in the address bar. Once in the flags area, type “enable-ftp” in the search bar stating “search flags”. When you see the “Enable support for FTP URLs” option tap where it says “Default”. Tap “Enable” option. Hit “Relaunch Now” option at the bottom of the page. FTP using Chrome You can download content via ftp://username:password@your-domain.com. But at the moment Chrome does not support uploading of content via FTP. To upload your files you may want to use FileZilla or CuteFTP. Some web browsers, such as Microsoft Internet Explorer, can also be used for FTP purposes and konsoleH includes the File Manager, which allows you to transfer files to and from your upload area.

    create a simple message box in Python

    import ctypes # An included library with Python install. ctypes.windll.user32.MessageBoxW(0, "Your text", "Your title", 1) Or define a function (Mbox) like so: import ctypes # An included library with Python install. def Mbox(title, text, style): return ctypes.windll.user32.MessageBoxW(0, text, title, style) Mbox('Your title', 'Your text', 1) Note the styles are as follows: ## Styles: ## 0 : OK ## 1 : OK | Cancel ## 2 : Abort | Retry | Ignore ## 3 : Yes | No | Cancel ## 4 : Yes | No ## 5 : Retry | Cancel ## 6 : Cancel | Try Again | Continue Note: edited to use MessageBoxW instead of MessageBoxA

    Python For Bluetooth

    https://ukbaz.github.io/en/html/reference/bluetooth_overview/index.html Back in 2015 I became aware of Bluetooth BLE Beacons and some of the things that could be done with them. At the same time I was helping on a STEM initiative called Go4SET where I would help students build out ideas of how to solve problems they had observed in the world around them. Their solution would show how electronics and software could be used to solve the problems. As Python was the language of choice in the schools I was working with, I started to investigate how to scan for BLE Beacons using a Raspberry Pi. Here we are in 2020 and I still don’t have a great solution for how to do this, but things have got better in that time and I’ve learnt some things along the way. One of the keys things I’ve learnt is that there is a lot of out-of-date information on the internet about Bluetooth. While I suspect my writings will (in time) add to the volume of out-of-date information on the internet about Bluetooth. For now I am aiming for it to be of some help to someone coming to the topic a new. So here is some Python-Linux-Bluetooth information that might help someone starting.

    Bad Information

    Many tutorials on the internet are done with command-line tools that have been deprecated, such as hcitool and hcidump. If you see tutorials using the HCI (Host Controller Interface) socket then it is either out-of-date or at such a low level that it is best to stay away. The command-line tools recommended by the BlueZ developers are bluetoothctl or, if you need more control, btmgmt. And instead of using hcidump, use btmon. I would also be very nervous about using a library that uses HCI sockets for interfacing with the Bluetooth hardware on Linux. More on the different programming interfaces later.

    But BlueZ…Really?

    During the years I’ve been playing around with Bluetooth on Linux I’ve seen people show their frustration with the way that BlueZ handles things. And I see peoples point. An example is that the HCI tools were deprecated and removed. It is hard to find tutorials on how to use the new tools and answers to questions on the mailing list expect a certain level of knowledge. It is also common for questions to go unanswered on the mailing list. This is Open Source so they don’t owe anyone an answer. However, I have also seen the developers show their frustration that people go off and do crazy things rather than how they had intended things to work. I spent many years of my professional life as an Application Engineer for a software company. My big learning from that time is that if you don’t show people how to use your tool (and make using it the way you wanted the easiest) then smart people will workout their own way of doing it. Having said all of that, the developers have settled on the DBus API and it is getting better and better. The biggest barrier for most people is finding the “on-ramp” to learning about how to use it. There are examples Python examples in the repository, but frankly they are often of limited value.

    BlueZ API

    A list of the possible API’s starting from lowest level and going to the highest. For most people, the higher the better.

    HCI Socket

    As I said earlier, this bypasses the bluetoothd that is running on the Linux system that is used by the desktop tools. Using this is not a great idea unless you really, really know what you are doing. All the information is available in the Bluetooth Core Specification which runs to about 3,256 pages for the 5.2 version of the spec.

    MGMT Socket

    The BlueZ Bluetooth Mamagement API is the next step up and the lowest level that the BlueZ developer recommend. The problem for Python users is this bug makes it difficult to access the mgmt socket. There are other duplicate bugs on this in the system. Until they are fixed, this remains off bounds for many Python users.

    DBus API

    This should be the go to level for most people wanting to interact with the BlueZ API’s. However, it seems the number of people that have done things with DBus previously is a relatively small group and it is another level of indirection to learn. There are a number of Python libraries that offer DBus bindings for Python. However, there isn’t just one library that is correct for all cases. pydbus is one of the easier ones to get started with. The BlueZ DBus API for interacting with the Bluetooth Adapter on your Raspberry Pi is documented at https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/doc/adapter-api.txt This allows you to know that the DBus Service is (org.bluez). The Object Path is less obvious from the documentation but is /org/bluez/hci0 by default on most Linux machines. With this information we can quickly look to see properties from the adapter using Python. The example below looks at name, if it is powered, and its mac adderess:

    Python For Bluetooth

    If you write applications on iOS or Android, then you will have seen there are some great libraries with API’s that hide much of the gnarly-ness of Bluetooth. With Python there are not those libraries around with that level of abstraction for most things you might want to do. So you might end up going a little deeper and needing to know some of the details of Bluetooth.

    Libraries to help you Bluetooth

    There are plenty of them out there. I keep a list of many of them at: https://github.com/ukBaz/python-bluezero/wiki Most of them are pretty niche in what they do. There are a number of them that are abondonware. This isn’t surprising given how big Bluetooth is and the many things you can do with it. It is also really hard to automate the testing of Python Bluetooth libraries and I think this is what ends up being the main reason why the libraries stay niche or abandoned.

    More than one Bluetooth

    Depending on where you are starting from there can be a number of details that can trip people up when they first engage with Bluetooth and code. The first is that there are two different types of Bluetooth. These are generally referred to as Classic and BLE. Devices like the Raspberry Pi support both. While the BBC micro:bit is BLE only. If you try to use Classic (aka BR/EDR, aka rfcomm, aka Serial port profile, aka spp, aka 1101, aka 00001101-0000-1000-8000-00805f9b34fb) on the Raspberry Pi then it will never speak sensibly with a micro:bit. Bluetooth Classic (BR/EDR) supports speeds up to about 24Mbps. It was version 4.0 of the standard that introduced a low energy mode, Bluetooth Low Energy (BLE or LE, also known as “Bluetooth Smart”), that operates at 1Mbps. This mode allows devices to leave their transmitters off most of the time. As a result it is “Low Energy”. These two modes have a different philosophy of how they behave. Classic is a cable replacement. It makes the connection and stays connected. BLE is similar to a database where the transmitter is only on when it is being written to or read from. Clients can also subscribe to notifications when data changes in the Generic ATTribute Profile (GATT). In classic mode there is a server and a client. The server advertises and the client connects. With BLE there are different terms of peripheral and central. A peripheral advertises and a central scans and connects. In BLE you can also have a Broadcaster (beacon) which is a transmitter only (connectionless) application. The Observer (scanner) role is for receiver only connectionless applications.

    Endianness

    As with most communication protocols, data is chopped up in to bytes that are sent between the two devices. When this is done there is a choice of what order those bytes are transmitted in. This is referred to as endianness The Bluetooth standard is little-endian which often trips people up that are looking at Bluetooth for the first time. The exception to this is when looking at beacons. As far as I can tell this seems to be because Apple did this when they brought out the iBeacon and many have followed that example.

    Binary

    Because Bluetooth has come out of the embedded world there are lots of binary numbers referring to things rather than nice string names. Lots of values are 128-bits in length. This means that when I want to look at the status of button A on a micro:bit I need to look in the GATT database for E95DDA90-251D-470A-A062-FA1922DFA9A8 In classic mode, the Serial Port Profile (SPP) is normally referred to by the 16-bit hex value of 0x1101. However, it is really an 128-bit value but because it is an official profile it can be shortened to a 16-bit value

    Bluetooth Special Interest Group (SIG) Reserved Values

    The SIG has the following number reserved and the xxxx below is replaced with the 16-bit value. 0000xxxx-0000-1000-8000-00805f9b34fb If you see a tutorial that is using 16-bit values without using official SIG profiles then be suspicious if that is a good tutorial.

    Asynchronous

    There are parts of Bluetooth that just needs to be asynchronous. Examples are when scanning for new devices or getting notifications from a peripheral. While this is possible to do with Python, asynchronous isn’t the way most people learn Python. For BlueZ, it works with the GLib event loop which will be familiar to people that have coded GUI’s in Python.

    Pairing and Connecting

    I have seen confusion between these two terms when people come to programming Bluetooth. Pairing is about the two devices exchanging information so that the devices can communicate securely. So pairing is a one-off activity to exchange credentials. It is not always required as sometimes it is OK for devices to exchange information without being secure. Especially if you are just learning as it simplifies the processes involved. Connection needs to be done every time you want the devices to start communicating. It is a straight forward step in the two devices already know about each other. I typically recommend that the one-off setup of scanning and pairing is done manually with bluetoothctl.

    RFCOMM (Or is that SPP?)

    This is the most useful profile in classic mode for many activities in the maker community when you want ot exchange information between two boards that support Bluetooth serial connection. From Python 3.3 this is supported within the standard socket library. Below is an example of a client connecting to a server. This assumes the pairing has already happened and will do the connection. >>> import socket >>> s = socket.socket(socket.AF_BLUETOOTH, socket.SOCK_STREAM, socket.BTPROTO_RFCOMM) >>> s.connect(('B8:27:EB:22:57:E0', 1)) >>> s.send(b'Hello') >>> s.recv(1024) b'world' >>> s.close() If this just works then life is great. If there are issues, then this is when Bluetooth can become more frustating. Debugging is probably a separate post.

    BLE (Or is that GATT)

    With BLE there is not the same level of support from native Python so it is required to use the DBus API. This means using the Device and GATT. The difficult piece with these is that it is not known ahead of connection what the DBus Object Path will be for the devices, GATT Services, and GATT Characteristics we are interested in. This results in the need to do a reverse look-up from the UUID to the object path. This was the subject of a kata I held at my local Python user group.

    Good To Know

    This talk at Embedded Linux Conference gave lots of good insight in to how things are done with BlueZ. It is worth a watch if you are interested in learning more.

    Python, Bluetooth, and Windows…

    In Python 3.9 it is going to be easier to use Bluetooth RFCOMM (Serial Port Profile) thanks to this submission: https://bugs.python.org/issue36590 範例 findmyphone.py 演示了使用一個 Python 小程式去尋找附近名稱為 My Phone 的藍芽裝置。範例如下所示,請自行修改 target_name 成你要尋找的藍芽裝置名稱即可, import bluetooth target_name = "My Phone" target_address = None nearby_devices = bluetooth.discover_devices() for bdaddr in nearby_devices: if target_name == bluetooth.lookup_name( bdaddr ): target_address = bdaddr break if target_address is not None: print "found target bluetooth device with address ", target_address else: print "could not find target bluetooth device nearby" 藍芽位址是由 xx:xx:xx:xx:xx:xx 的形式所組成,xx 為十六進制,怎麼查詢藍芽位址請看這篇,每個藍芽裝置都有個獨一無二的藍芽位址。但是如果我們要找”某個名稱”的藍芽裝置,而不是用藍芽位址去找,那會分成兩步驟: 以上述 findmyphone.py 為例,首先程式會先掃描附近的藍芽裝置,呼叫 discover_devices() 尋找附近的裝置(大概10秒),然後回傳一個列表, 再來,使用 lookup_name() 去連接上每個已偵測到的裝置,請求它們的裝置名稱,並且順便判斷名稱是不是我們要尋找的 My Phone target name,是的話會顯示找到並印出藍芽位址。 在區域內掃描藍芽裝置和查找裝置名稱這過程有時可能會失敗(空氣中其他的干擾等等不定因素,裝置很多,裝置在移動?!),discover_devices() 有會回傳 None,意味著無法用裝置名稱來進行後續的匹配,這時最好的解決方式就是多試幾次看看XD。 https://people.csail.mit.edu/albert/bluez-intro/c212.html

    Ciphey

    Installation python3 -m pip install ciphey --upgrade Windows Python defaults to install 32-bit. Ciphey only supports 64-bit. Make sure you're using 64-bit Python. There are 3 ways to run Ciphey. File Input ciphey -f encrypted.txt Unqualified input ciphey -- "Encrypted input" Normal way ciphey -t "Encrypted input" To get rid of the progress bars, probability table, and all the noise use the quiet mode. ciphey -t "encrypted text here" -q For a full list of arguments, run ciphey --help. Importing Ciphey You can import Ciphey's main and use it in your own programs and code. from Ciphey.__main__ import main

    47 个 Python 人工智能库

    The List
    Python 核心库
    Python 机器学习
    Python 深度学习
    Python 分布式深度学习库
    Python 自然语言处理
    Python 计算机视觉
    Python 生物/化学

    The List

    Numpy 库 https://www.numpy.org.cn/ SciPy 库 https://www.scipy.org/ Pandas 库 https://pandas.pydata.org/ statsmodels 库 https://www.statsmodels.org/ Scikit-Learn 库 https://scikit-learn.org.cn/ LightGBM 库 https://lightgbm.readthedocs.io CatBoost 库 https://catboost.ai/ Eli5 库 https://eli5.readthedocs.io Theano 库 https://pypi.org/project/Theano/ PyBrain库 https://github.com/pybrain/pybrain/ Shogun库 https://github.com/shogun-toolbox/shogun Chainer库 https://www.cnpython.com/pypi/chainerrl PyLearn2库 http://github.com/lisa-lab/pylearn2 Hebel库 https://www.oschina.net/p/hebel/ Neurolab库 https://pythonhosted.org/neurolab/ TensorFlow 库 https://www.tensorflow.org/ PyTorch 库 https://pytorch.org/ Keras 库 https://keras.io/zh/ Caffe2 库 http://caffe.berkeleyvision.org/ dist-Keras 库 https://joerihermans.com/work/distributed-keras/ elephas 库 https://pypi.org/project/elephas/ Spark-Deep-Learning 库 https://databricks.github.io/spark-deep-learning/ Mxnet库 https://pypi.org/project/mxnet/ Sklearn-theano库 https://github.com/sklearn-theano/ NLTK 库 https://www.nltk.org/ SpaCy 库 https://spacy.io/ PKUSeg 库 https://pypi.org/project/pkuseg/ Gensim 库 https://radimrehurek.com/gensim/ CoreNLP 库 https://stanfordnlp.github.io/CoreNLP/ TextBlob 库 https://pypi.org/project/textblob/ Stanfordnlp 库 https://github.com/stanfordnlp/stanfordnlp openCV 库 https://opencv.org/ Scikit-Image 库 https://scikit-image.org/ Pillow/PIL 库 https://pillow.readthedocs.io/en/stable SimpleCV 库 http://simplecv.org/ Mahotas 库 https://pypi.org/project/mahotas/0.99/ ITK 库 https://itk.org/ Pgmagick 库 https://pythonhosted.org/pgmagick/index.html Pycairo 库 https://www.cairographics.org/pycairo/ Fastai库 https://pypi.org/project/fastai/ Imutils库 https://pypi.org/project/imutils/ PyTorchCV库 https://pytorch-cn.readthedocs.io/zh/latest/ BioPython 库 https://biopython-cn.readthedocs.io/ DashBio 库 http://dash.plot.ly/dash-bio RDKit 库 http://www.rdkit.org/

    Python 核心库

    1.Numpy 库 https://www.numpy.org.cn/ 特点:NumPy (Numerical Python) 是 Python 语言的一个扩展程序库,支持大量的维度数组与矩阵运算,此外也针对数组运算提供大量的数学函数库。 NumPy 通常 SciPy(Scientific Python)和 Matplotlib(绘图库)一起使用,这种组合广泛用于替代 MatLab,是一个强大的科学计算环境,有助于我们通过 Python 学习数据科学或者机器学习。 2.SciPy 库 https://www.scipy.org/ 特点:SciPy 是一个开源的 Python 算法库和数学工具包。SciPy 包含的模块有最优化、线性代数、积分、插值、特殊函数、快速傅里叶变换、信号处理和图像处理、常微分方程求解和其他科学与工程中常用的计算。它用于有效计算 Numpy 矩阵,使 Numpy 和 Scipy 协同工作,高效解决问题。 3.Pandas 库 https://pandas.pydata.org/ 特点:Pandas 是 Python 语言的一个扩展程序库,用于数据分析。Pandas 是一个开放源码、BSD许可的库,提供高性能、易于使用的数据结构和数据分析工具,基础是 Numpy(提供高性能的矩阵运算),可以从各种文件格式比如CSV、JSON、SQL、Excel导入数据。Pandas 可以对各种数据进行运算操作,比如归并、再成形、选择,还有数据清洗和数据加工特征。Pandas 广泛应用在学术、金融、统计学等各个数据分析领域。 4.statsmodels 库 https://www.statsmodels.org/ 特点:statsmodels 是一个 Python 库,用于拟合多种统计模型,执行统计测试以及数据探索和可视化。statsmodels 包含更多的“经典”频率学派统计方法,而贝叶斯方法和机器学习模型可在其他库中找到。包含在 statsmodels 中的一些模型:线性模型,广义线性模型和鲁棒线性模型,线性混合效应模型,方差分析(ANOVA)方法,时间序列过程和状态空间模型,广义的矩量法。

    Python 机器学习

    5.Scikit-Learn 库 https://scikit-learn.org.cn/ 特点:Scikit-learn(以前称为scikits.learn,也称为sklearn)是针对 Python 编程语言的免费软件机器学习库。它具有各种分类,回归和聚类算法,包括支持向量机,随机森林,梯度提升,k均值和DBSCAN,并且旨在与 Python 数值科学库 NumPy 和 SciPy 联合使用。 // 6.XGBoost 库 https://xgboost.ai/ 特点:XGBoost是一个优化的分布式梯度增强库,旨在实现高效,灵活和便携。它在 Boosting框架下实现机器学习算法。XGBoost提供并行树提升(也称为GBDT,GBM),可以快速准确地解决许多数据科学问题。相同的代码在主要的分布式环境(Hadoop,SGE,MPI)上运行,并且可以解决数十亿个示例之外的问题。 7.LightGBM 库 https://lightgbm.readthedocs.io 特点:LightGBM(Light Gradient Boosting Machine) 是微软开源的一个实现 GBDT 算法的框架,支持高效率的并行训练。LightGBM 提出的主要原因是为了解决 GBDT 在海量数据遇到的问题,让 GBDT 可以更好更快地用于工业实践。其具有以下优点:更快的训练速度、更低的内存消耗、更好的准确率、分布式支持,可以快速处理海量数据。 8.CatBoost 库 https://catboost.ai/ 特点:CatBoost 是由 Yandex 的研究人员和工程师开发的基于梯度提升决策树的机器学习方法,现已开源。CatBoost 在 Yandex 公司内广泛使用,用于排列任务、预测和提出建议。CatBoost 是通用的,可应用于广泛的领域和各种各样的问题。 9.Eli5 库 https://eli5.readthedocs.io 特点:ELI5 是一个 Python 库,允许使用统一API可视化地调试各种机器学习模型。它内置了对多个ML框架的支持,并提供了一种解释黑盒模型的方法。它有助于调试机器学习分类器并解释它们的预测。 10.Theano 库 https://pypi.org/project/Theano/ 特点:Theano 是一个 Python 库,专门用于定义、优化、求值数学表达式,效率高,适用于多维数组。特别适合做机器学习。一般来说,使用时需要安装 Python 和 Numpy 。 11.PyBrain库 https://github.com/pybrain/pybrain/ 特点:PyBrain的概念是将一系列的数据处理的算法封装到被称之为Module的模块中。一个最小的Module通常包含基于机器学习算法的可调整的参数集合。 12.Shogun库 https://github.com/shogun-toolbox/shogun 特点:Shogun是一个开源机器学习库,它提供广泛的高效和统一的机器学习方法,如多种数据表示、算法类和通用工具的组合,用于快速原型设计数据管道。

    Python 深度学习

    13.Chainer库 https://www.cnpython.com/pypi/chainerrl 特点:ChainerCV是一个基于Chainer用于训练和运行计算机视觉任务的神经网络工具。它涵盖了计算机视觉模型的高质量实现,以及开展计算机视觉研究的必备工具集。 14. PyLearn2库 http://github.com/lisa-lab/pylearn2 特点:Pylearn2是一个基于Theano的机器学习库,它的大部分功能是基于Theano顶层实现的。这意味着用户可以用数学表达式去编写Pylearn2插件(新模型、算法等)。 15.Hebel库 https://www.oschina.net/p/hebel/ 特点:Hebel 是一个通过 PyCUDA 库使用 GPU CUDA 来加速建立神经网络的深度学习库。它实现了几类最重要的神经网络模型,提供各种激活函数和训练模型。 16.Neurolab库 https://pythonhosted.org/neurolab/ 特点:neurolab是一个简单而强大的Python神经网络库。包含基于神经网络、训练算法和灵活的框架来创建和探索其他神经网络类型。 17.TensorFlow 库 https://www.tensorflow.org/ 特点:TensorFlow 是一个基于数据流编程(dataflow programming)的符号数学系统,被广泛应用于各类机器学习(machine learning)算法的编程实现,其前身是谷歌的神经网络算法库 DistBelief 。Tensorflow 拥有多层级结构,可部署于各类服务器、PC终端和网页并支持GPU和TPU高性能数值计算,被广泛应用于谷歌内部的产品开发和各领域的科学研究。 18.PyTorch 库 https://pytorch.org/ 特点:PyTorch 是一个开源的 Python 机器学习库,基于 Torch,用于自然语言处理等应用程序。PyTorch 的前身是 Torch ,其底层和 Torch 框架一样,但是使用 Python 重新写了很多内容,不仅更加灵活,支持动态图,而且提供了 Python接口。它是由 Torch7 团队开发,是一个以 Python 优先的深度学习框架,不仅能够实现强大的GPU加速,同时还支持动态神经网络。PyTorch 既可以看作加入了GPU支持的 Numpy,同时也可以看成一个拥有自动求导功能的强大的深度神经网络。除了 Facebook 外,它已经被Twitter、CMU 和 Salesforce 等机构采用。 19.Keras 库 https://keras.io/zh/ 特点:Keras 是一个由 Python 编写的开源人工神经网络库,可以作为 Tensorflow、 Microsoft-CNTK 和 Theano 的高阶应用程序接口,进行深度学习模型的设计、调试、评估、应用和可视化。Keras 在代码结构上由面向对象方法编写,完全模块化并具有可扩展性。Keras 支持现代人工智能领域的主流算法,包括前馈结构和递归结构的神经网络,也可以通过封装参与构建统计学习模型。在硬件和开发环境方面,Keras 支持多操作系统下的多GPU并行计算,可以根据后台设置转化为 Tensorflow、Microsoft-CNTK 等系统下的组件。 20.Caffe2 库 http://caffe.berkeleyvision.org/ 特点:Caffe是由Berkeley Vision and Learning Center(BVLC)建立的深度学习框架。它是模块化的,速度极快。

    Python 分布式深度学习库

    21.dist-Keras 库 https://joerihermans.com/work/distributed-keras/ 特点:dist-Keras 是在 Apache Spark 和 Keras 之上构建的分布式深度学习框架,其重点是“最先进的”分布式优化算法。以易于实现新的分布式优化器的方式设计了框架,从而使人们能够专注于研究。支持多种分布式方法,例如但不限于使用数据并行方法训练合奏和模型。 22.elephas 库 https://pypi.org/project/elephas/ 特点:elephas 是一个把 Python 深度学习框架 Keras 衔接到 Spark 集群的第三方 python 包。 23.Spark-Deep-Learning 库 https://databricks.github.io/spark-deep-learning/ 特点:Spark-Deep-Learning 为使用 Apache Spark 的 Python 中可伸缩的深度学习提供了高级api。该库来自 Databricks ,并利用 Spark 实现了两个最强大的方面:本着 Spark 和 Spark MLlib 的精神,它提供了易于使用的API,能够在很少的代码行中进行深入学习;它使用 Spark 强大的分布式引擎来扩展对海量数据集的深度学习。 24.Mxnet库 https://pypi.org/project/mxnet/ 特点:MXNet 是一款设计为效率和灵活性的深度学习框架。它允许你混合符号编程和命令式编程,从而最大限度提高效率和生产力。 25.Sklearn-theano库 https://github.com/sklearn-theano/ 特点:sklearn-theano的功能所在。你不能用它从头到尾的训练一个模型,但它的神奇之处就是可以把网络作为特征提取器。

    Python 自然语言处理

    26.NLTK 库 https://www.nltk.org/ 特点:NLTK(Natural Language Toolkit)自然语言处理工具包,是 NLP 研究领域常用的一个 Python 库,由宾夕法尼亚大学的 Steven Bird 和 Edward Loper 在 Python 的基础上开发的一个模块,至今已有超过十万行的代码。这是一个开源项目,包含数据集、 Python 模块、教程等。 27.SpaCy 库 https://spacy.io/ 特点:SpaCy 是一个 Python 和 CPython 的 NLP 自然语言文本处理库。SpaCy主要功能包括分词、词性标注、词干化、命名实体识别、名词短语提取等等。 28.PKUSeg 库 https://pypi.org/project/pkuseg/ 特点:PKUSeg-Python 是由北京大学语言计算与机器学习研究组研制推出的一个高准确度的中文分词工具包。PKUSeg-Python 简单易用,支持多领域分词,在不同领域的数据上都大幅提高了分词的准确率。 29.Gensim 库 https://radimrehurek.com/gensim/ 特点:Gensim 是一个相当专业的主题模型 Python 工具包。在文本处理中,比如商品评论挖掘,有时需要了解每个评论分别和商品的描述之间的相似度,以此衡量评论的客观性。评论和商品描述的相似度越高,说明评论的用语比较官方,不带太多感情色彩,比较注重描述商品的属性和特性,角度更客观。Gensim 就是 Python 里面计算文本相似度的程序包。 30.CoreNLP 库 https://stanfordnlp.github.io/CoreNLP/ 特点:Stanford CoreNLP 提供了一套人类语言技术工具。支持多种自然语言处理基本功能,Stanford CoreNLP 是它的一个 Python 接口。Stanford CoreNLP 主要功能包括分词、词性标注、命名实体识别、句法结构分析和依存分析等等。 31.TextBlob 库 https://pypi.org/project/textblob/ 特点:用于处理文本数据的Python库。它提供一个简单的API,可用于深入研究常见的NLP任务,如词性标注、名词短语提取、情感分析、文本翻译、分类等。 32.Stanfordnlp 库 https://github.com/stanfordnlp/stanfordnlp 特点:Stanford NLP提供了一系列自然语言分析工具。它能够给出基本的词形、词性,并且能够标记句子的结构,语法形式和字词的依赖,指明那些名字指向同样的实体,指明情绪,提取发言中的开放关系等。

    Python 计算机视觉

    33.openCV 库 https://opencv.org/ 特点:OpenCV 是一个基于BSD许可(开源)发行的跨平台计算机视觉和机器学习软件库,可以运行在 Linux、Windows、Android 和 MacOS 操作系统上。它轻量级而且高效——由一系列 C 函数和少量 C++ 类构成,同时提供了 Python、Ruby、MATLAB 等语言的接口,实现了图像处理和计算机视觉方面的很多通用算法。 34.Scikit-Image 库 https://scikit-image.org/ 特点:Scikit-Image 是图像处理算法的集合,采用 Python 语言编写。它实现了用于研究、教育和工业应用的算法和实用程序。它是一个相当简单和直接的库,即使对于 Python 生态系统的新手也是如此。 35.Pillow/PIL 库 https://pillow.readthedocs.io/en/stable 特点:PIL(Python Imaging Library)已经是 Python 平台事实上的图像处理标准库了。PIL 功能非常强大,但API却非常简单易用。由于 PIL 仅支持到 Python2.7,加上年久失修,于是一群志愿者在 PIL 的基础上创建了兼容的版本,名字叫 Pillow,支持最新Python 3.x,又加入了许多新特性。 36.SimpleCV 库 http://simplecv.org/ 特点:SimpleCV 将很多强大的开源计算机视觉库包含在一个便捷的 Python 包中。使用 SimpleCV,你可以在统一的框架下使用高级算法,例如特征检测、滤波和模式识别。使用者不用清楚一些细节,比如图像比特深度、文件格式、颜色空间、缓冲区管理、特征值还有矩阵和图像的存储。 37.Mahotas 库 https://pypi.org/project/mahotas/0.99/ 特点:Mahotas 是一个 Python 的图像处理库,包含大量的图像处理算法,使用 C++ 实现的算法,处理性能相当好。 38.ITK 库 https://itk.org/ 特点:ITK( Insight Segmentation and Registration Toolkit)是美国国家卫生院下属的国立医学图书馆开发的一款医学图像处理软件包,是一个开源的、跨平台的影像分析扩展软件工具。 39.Pgmagick 库 https://pythonhosted.org/pgmagick/index.html 特点:Pgmagick 是 GraphicsMagick 库的一个基于 Python 的包装器。图像处理系统有时被称为图像处理的瑞士军刀。它提供了一个健壮而高效的工具和库集合,支持以88种主要格式(包括重要格式,如DPX、GIF、JPEG、JPEG-2000、PNG、PDF、PNM和TIFF)读取、写入和操作图像。 40.Pycairo 库 https://www.cairographics.org/pycairo/ 特点:pyCairo 是一个 Python 的优秀2D图形渲染库。 41.Fastai库 https://pypi.org/project/fastai/ 特点:计算机视觉、文本、表格数据、时间序列、协同过滤等常见深度学习应用提供单一一致界面的深度学习库。 42.Imutils库 https://pypi.org/project/imutils/ 特点:imutils是在OPenCV基础上的一个封装,达到更为简结的调用OPenCV接口的目的,它可以轻松的实现图像的平移,旋转,缩放,骨架化等一系列的操作。 43.PyTorchCV库 https://pytorch-cn.readthedocs.io/zh/latest/ 特点:TorchCV 支持图像分类、语义分割、目标检测、姿态检测、实例分割、生成对抗网络等任务中的多个常见模型。

    Python 生物/化学

    45.BioPython 库 https://biopython-cn.readthedocs.io/ 特点:Biopython 项目是旨在减少计算生物学中代码重复的开源项目之一,由国际开发人员协会创建。它包含表示生物序列和序列注释的类,并且能够读取和写入各种文件格式(FASTA,FASTQ,GenBank 和 Clustal 等),支持以程序化方式访问生物信息的在线数据库(例如,NCBI)。独立的模块扩展了 Biopython 的序列比对,蛋白质结构,群体遗传学,系统发育,序列基序和机器学习等功能。 46.DashBio 库 http://dash.plot.ly/dash-bio 特点:Dash Bio 是一个免费的开源 Python 库,用于生物信息学和药物开发应用。 47.RDKit 库 http://www.rdkit.org/ 特点:RDKit 是一个用于化学信息学的开源工具包,基于对化合物2D和3D分子操作,利用机器学习方法进行化合物描述符生成,fingerprint 生成,化合物结构相似性计算,2D和3D分子展示等。基于Python语言进行调取使用。

    Python 爬虫 pyquery

    在使用之前,请确保已经安装好qyquery库。安装教程如下所示: pip install pyquery

    初始化

    和Beautiul Soup一样,在初始化pyquery的时候,也需要传入html文本来初始化一个pyquery对象。 初始化的时候一般有三种传入方式:传入字符串、传入URL、传入html文件。 字符串初始化 html = ''' <div> <ul> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item=-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> ''' from pyquery import PyQuery as pq doc = pq(html) print(doc) print(type(doc)) print(doc('li')) 先对上面的代码做简单的描述: 首先引入PyQuery对象,取名为pq。然后声明一个长HTML字符串,并将其当作参数传给PyQuery类,这样就成功的进行了初始化。 接下来将css选择器作为参数传入初始化对象,在这个示例中我们传入li节点,这样就可以选择所有的li节点.。 URL初始化 初始化对象的参数不仅可以是字符串,还可以是网页的URL,这时可以将URL作为参数传入初始化对象。 具体代码如下所示: from pyquery import PyQuery as pq doc = pq('https://www.baidu.com', encoding='utf-8') print(doc) print(type(doc)) print(doc('title')) 试着运行上面的代码你会发现,我们成功的获取到了百度的title节点和网页信息。 PyQuery对象会先请求这个URL,然后用得到的HTML内容完成初始化,这其实就相当于网页源代码以字符串的形式传递给初始化对象。 因此,还可以这样写代码: from pyquery import PyQuery as pq import requests url = 'https://www.baidu.com' doc = pq(requests.get(url).content.decode('utf-8')) print(doc) print(type(doc)) print(doc('title')) 运行结果与上面那段代码的运行结果是一致的。 文件初始化 除了传递URL以外还可以传递本地的文件名,此时只要传递本地文件名,此时将参数指定为filename即可。 具体代码如下所示: from pyquery import PyQuery as pq doc = pq(filename='baidu.html') print(doc) print(type(doc)) print(doc('title')) 以上三种初始化的方式都是可以的,当然最常用的初始化方式还是以字符串的形式传递。

    基本CSS选择器

    html = ''' <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item=-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> ''' from pyquery import PyQuery as pq doc = pq(html) print(doc('#container .list li')) print(type(doc('#container .list li'))) 初始化PyQuery对象之后,传入CSS选择器#container .list li将所有符合条件的节点输出,并且运行上面的代码之后你会发现它的类型依然还是PyQuery类型。

    查找节点

    下面介绍一些常用的查询函数,这些函数与jQuery函数的用法是完全相同的。 子节点 查找子节点时需要用到find()方法,并传入的参数是CSS选择器,以前面的html为例子。 from pyquery import PyQuery as pq doc = pq(html) print(doc.find('li')) print(type(doc.find('li'))) 调用find()方法,将节点名称li传入该方法,获取所有符合条件的内容。类型依然还是PyQuery。 当然我们还可以这样写: from pyquery import PyQuery as pq doc = pq(html) items = doc('.list') print(type(items)) lis = items.find('li') print(type(lis)) print(lis) 首先先选取class为list的节点,然后调用find()方法,传入CSS选择器,选取内部的``li`节点,最后打印输出。 其实find()方法是查找所有的子孙节点,要获取所有的子节点可以调用chirdren()方法。具体代码如下所示: from pyquery import PyQuery as pq doc = pq(html) items = doc('.list') lis = items.children() print(lis) print(type(lis)) 如果想要筛选子节点中符合条件的节点,可以向chirdren()方法传入CSS选择器。具体代码如下所示: from pyquery import PyQuery as pq doc = pq(html) items = doc('.list') lis = items.children('.active') print(lis) print(type(lis)) 试着运行上面的代码你会发现,这里已经成功获取到了class为active的节点。 父节点 我们可以调用parent()方法来获取某个节点的父节点。 html = ''' <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item=-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> ''' from pyquery import PyQuery as pq doc = pq(html) items = doc('.list') container = items.parent() print(container) print(type(container)) 先对上面的代码做简要的说明: 首先选取class为list的节点,然后再调用parent()方法得到其父节点,其类型依然还是PyQuery类型。 这里的父节点是直接父节点,但是如果要获取祖父节点,可以调用parents()方法。 html = ''' <div class="wrap"> <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item=-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </div> ''' from pyquery import PyQuery as pq doc = pq(html) items = doc('.list') container = items.parents() print(container) print(type(container)) 运行上面的代应为码之后,你会发现这里输出的内容有四个,因为class为list节点的祖父节点有四个,分别是:container、wrap、body、html。在初始化对象的时候已经添加上了body和html节点。 兄弟节点 除了可以获取到父节点和子节点之外,还可以获取到兄弟节点。如果需要获取兄弟节点,可以调用siblings()方法。 具体代码如下所示: html = ''' <div class="wrap"> <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </div> ''' from pyquery import PyQuery as pq doc = pq(html) items = doc('.list .item-0.active') print(items.siblings()) 这里首先选取类为.item-0.active的节点,再调用siblings()方法获取到该节点的兄弟节点。 试着运行上面的代码,你会发现获取到其他四个兄弟节点。

    遍历

    通过上面的代码可以观察到,pyquery的选择结果可能是多个节点,也可能是单个节点,类型都是PyQuery类型,并没有向Beautiful Soup那样的列表。 对于单个节点来说,可以直接打印输出,也可以直接转成字符串。 from pyquery import PyQuery as pq doc = pq(html) items = doc('.list .item-0.active') print(items) print(str(items)) print(type(items)) 对于多个节点,可以通过调用item()方法,将获取的内容转换成生成器类型,在通过遍历的方式输出。 具体代码如下所示: from pyquery import PyQuery as pq doc = pq(html) lis = doc('li').items() print(lis) for li in lis: print(li, type(li)) 运行上面的代码,你会发现输出变量lis的结果是生成器,因此可以遍历输出。

    获取信息

    一般来说,在网页里面我们需要获取的信息有两类:一类是文本内容,另一类是节点属性值。 获取属性 获取到某个PyQuery类型的节点之后,就可以通过attr()方法来获取属性。 具体代码如下所示: from pyquery import PyQuery as pq doc = pq(html) a = doc('.list .item-0.active a') print(a.attr('href')) 先获取class为list下面的class为item-0 active的节点下的a节点,这时变量a是PyQuery类型,再调用attr()方法并传入属性值href。 当然也可以通过调用attr属性来获取属性。 print(a.attr.href) 你会发现输出结果与上面的代码是一样的。 当然,我们也可以获取到所有a节点的属性,具体代码如下所示: html = ''' <div class="wrap"> <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </div> ''' from pyquery import PyQuery as pq doc = pq(html) a = doc('a').items() for item in a: print(item.attr('href')) 但是如果代码这样写: from pyquery import PyQuery as pq doc = pq(html) a = doc('a') print(a.attr('href')) 运行上面的代码之后,你会发现只获取到第一个a节点的href属性。 所有这个是需要注意的地方!! 提取文本 提取文本与提取属性的逻辑是一样的,首先获取到class为PyQuery的节点,再调用text()方法获取文本。 首先来获取一个节点的文本内容。具体代码如下所示: html = ''' <div class="wrap"> <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </div> ''' from pyquery import PyQuery as pq doc = pq(html) a = doc('.list .item-0.active a') print(a.text()) 试着运行上面的代码你会发现成功获取a节点的文本内容。 接下来我们就来获取多个li节点的文本内容。 具体代码如下所示: html = ''' <div class="wrap"> <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </div> ''' from pyquery import PyQuery as pq doc = pq(html) items = doc('li') print(items.text()) 运行上面的代码,你会发现该代码成功获取到了所有节点名称为li的文本内容,中间用空格隔开。 如果你想要一个一个获取,那还是少不了生成器,具体代码如下所示: from pyquery import PyQuery as pq doc = pq(html) items = doc('li').items() for item in items: print(item.text())

    节点操作

    pyquery提供了一系列方法对节点进行动态修改,比如为某个节点添加一个class,移除某个节点,这些操作有时会为提取信息带来便利。 add_class和remove_class html = ''' <div class="wrap"> <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </div> ''' from pyquery import PyQuery as pq doc = pq(html) li = doc('.list .item-0.active') print(li) li.remove_class('active') print(li) li.add_class('active') print(li) 运行结果如下所示: <li class="item-0 active"><a href="link3.html"><span class="" bold="">third item</span></a></li> <li class="item-0"><a href="link3.html"><span class="" bold="">third item</span></a></li> <li class="item-0 active"><a href="link3.html"><span class="" bold="">third item</span></a></li> 上面有三段输出内容,首先先获取一个li节点,然后再删除active类属性,第三段代码是添加active类属性。

    伪类选择器

    CSS选择器之所以强大,还有一个很重要的原因,那就是它可以支持多种多样的伪类选择器,例如选择第一个节点、最后一个节点、奇偶数节点、包含某一文本的节点。 html = ''' <div class="wrap"> <div id="container"> <ul class="list"> <li class="item-0">first-item</li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-0 active"><a href="link3.html"><span class=""bold>third item</span></a></li> <li class="item-1 active"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </div> ''' from pyquery import PyQuery as pq doc = pq(html) li = doc('li:first-child') # 第一个li节点 print(li) li = doc('li:last-child') # 最后一个li节点 print(li) li = doc('li:nth-child(2)') # 第二个位置的li节点 print(li) li = doc('li:gt(2)') # 第三个之后的li节点 print(li) li = doc('li:nth-child(2n)') # 偶数位置的li节点 print(li) li = doc('li:contains(second)') # 包含second文本的li节点 print(li) 至此,关于pyquery的所有内容都讲完了,接下来就进入实战了,光说不练肯定是不行的,只有通过实战才能正真学会刚刚所学会的知识。

    实战

    本次我带来的实战内容是爬取猫眼电影的TOP100的排行榜及评分情况。

    准备

    工欲善其事,必先利其器。首先,我们要准备几个库:pyquery、requests。 安装过程如下: pip install pyquery pip install requests

    前言

    寒假又到来了,小伙伴们准备怎么过呢? 在大冬天里,躲在被窝刷剧是最舒服的,好怀念当年的生活啊~ 所以今天就来爬取猫眼电影的TOP100排行榜,为冬眠做好准备。 网站链接: https://maoyan.com/board/4

    需求分析与功能实现

    获取电影名称

    需要的信息藏在class为board-item-maindiv标签下的a标签内,因此我们需要获取其文本信息。 核心代码如下所示: movie_name = doc('.board-item-main .board-item-content .movie-item-info p a').text()

    获取主演信息

    从上图可以看到,主演的信息位于board-item-main的子节点p标签内,因此我们可以这样获取主演信息。 核心代码如下所示: p = doc('.board-item-main .board-item-content .movie-item-info') star = p.children('.star').text()

    获取上映时间

    从前面的图片也可以看到,上映时间的信息与主演信息的节点是兄弟节点,所以我们可以这样写代码。 p = doc('.board-item-main .board-item-content .movie-item-info') time = p.children('.releasetime').text()

    获取评分

    要获取每一部电影的评分相对要复杂一些,为什么这样说呢?我们来看下面的图片。 从上面的图片可以看到,整数部分与小数部分被分割了成了两部分。因此需要分别获取两部分的数据,在进行拼接即可。 核心代码如下所示: score1 = doc('.board-item-main .movie-item-number.score-num .integer').text().split() score2 = doc('.board-item-main .movie-item-number.score-num .fraction').text().split() score = [score1[i]+score2[i] for i in range(0, len(score1))]

    关于翻页

    打开网页的时候,你会发现榜单一共有10页,每一页的URL都不相同,那该怎么办呢?总不能每一次都手动更换URL地址吧。 先来观察前四页的URL地址吧。 https://maoyan.com/board/4 # 第一页 https://maoyan.com/board/4?offset=10 # 第二页 https://maoyan.com/board/4?offset=20 # 第三页 https://maoyan.com/board/4?offset=30 # 第四页 观察完之后,我想不需要我过多叙述它的特点了吧。 接下来我们就可以构建每一页的URL地址了,具体代码如下所示: def get_url(self, page): url = f'https://maoyan.com/board/4?offset={page}' return url if __name__ == '__main__': maoyan = MaoYan() for page in range(10): url = maoyan.get_url(page*10)

    反爬虫

    未闻Code 未聞Code 逃不掉被反爬蟲

    一、为什么要反爬虫

    1、爬虫占总PV比例较高,这样浪费钱(尤其是三月份爬虫)。 三月份爬虫是个什么概念呢? 每年的三月份我们会迎接一次爬虫高峰期。 最初我们百思不得其解。 直到有一次,四月份的时候,我们删除了一个url,然后有个爬虫不断的爬取url,导致大量报错,测试开始找我们麻烦。 我们只好特意为这个爬虫发布了一次站点,把删除的url又恢复回去了。 但是当时我们的一个组员表示很不服,说,我们不能干掉爬虫,也就罢了,还要专门为它发布,这实在是太没面子了。 于是出了个主意,说:url可以上,但是,绝对不给真实数据。 于是我们就把一个静态文件发布上去了。 报错停止了,爬虫没有停止,也就是说对方并不知道东西都是假的。 这个事情给了我们一个很大的启示,也直接成了我们反爬虫技术的核心:变更。 后来有个学生来申请实习。 我们看了简历发现她爬过携程。 后来面试的时候确认了下,果然她就是四月份害我们发布的那个家伙。 不过因为是个妹子,技术也不错,后来就被我们招安了。 现在已经快正式入职了。 后来我们一起讨论的时候,她提到了,有大量的硕士在写论文的时候会选择爬取OTA数据,并进行舆情分析。 因为五月份交论文,所以嘛,大家都是读过书的,你们懂的,前期各种DotA,LOL,到了三月份了,来不及了,赶紧抓数据,四月份分析一下,五月份交论文。 就是这么个节奏。 2、公司可免费查询的资源被批量抓走,丧失竞争力,这样少赚钱。 OTA的价格可以在非登录状态下直接被查询,这个是底线。 如果强制登陆,那么可以通过封杀账号的方式让对方付出代价,这也是很多网站的做法。 但是我们不能强制对方登录。 那么如果没有反爬虫,对方就可以批量复制我们的信息,我们的竞争力就会大大减少。 竞争对手可以抓到我们的价格,时间长了用户就会知道,只需要去竞争对手那里就可以了,没必要来携程。 这对我们是不利的。 3、爬虫是否涉嫌违法?如果是的话,是否可以起诉要求赔偿?这样可以赚钱。 这个问题我特意咨询了法务,最后发现这在国内还是个擦边球,就是有可能可以起诉成功,也可能完全无效。 所以还是需要用技术手段来做最后的保障。

    二、反什么样的爬虫

    1、十分低级的应届毕业生 开头我们提到的三月份爬虫,就是一个十分明显的例子。 应届毕业生的爬虫通常简单粗暴,根本不管服务器压力,加上人数不可预测,很容易把站点弄挂。 顺便说下,通过爬携程来获取offer这条路已经行不通了。 因为我们都知道,第一个说漂亮女人像花的人,是天才。 而第二个。。。你们懂的吧? 2、十分低级的创业小公司 现在的创业公司越来越多,也不知道是被谁忽悠的然后大家创业了发现不知道干什么好,觉得大数据比较热,就开始做大数据。 分析程序全写差不多了,发现自己手头没有数据。 怎么办?写爬虫爬啊。 于是就有了不计其数的小爬虫,出于公司生死存亡的考虑,不断爬取数据。 3、不小心写错了没人去停止的失控小爬虫 携程上的点评有的时候可能高达60%的访问量是爬虫。 我们已经选择直接封锁了,它们依然孜孜不倦地爬取。 什么意思呢? 就是说,他们根本爬不到任何数据,除了http code是200以外,一切都是不对的,可是爬虫依然不停止这个很可能就是一些托管在某些服务器上的小爬虫,已经无人认领了,依然在辛勤地工作着。 4、成型的商业对手 这个是最大的对手,他们有技术,有钱,要什么有什么,如果和你死磕,你就只能硬着头皮和他死磕。 5、抽风的搜索引擎 大家不要以为搜索引擎都是好人,他们也有抽风的时候,而且一抽风就会导致服务器性能下降,请求量跟网络攻击没什么区别。

    三、什么是爬虫和反爬虫

    因为反爬虫暂时是个较新的领域,因此有些定义要自己下。 我们内部定义是这样的:
  • 爬虫:使用任何技术手段,批量获取网站信息的一种方式。关键在于批量。
  • 反爬虫:使用任何技术手段,阻止别人批量获取自己网站信息的一种方式。 关键也在于批量。
  • 误伤:在反爬虫的过程中,错误的将普通用户识别为爬虫。 误伤率高的反爬虫策略,效果再好也不能用。
  • 拦截:成功地阻止爬虫访问。这里会有拦截率的概念。 通常来说,拦截率越高的反爬虫策略,误伤的可能性就越高。因此需要做个权衡。
  • 资源:机器成本与人力成本的总和。
  • 这里要切记,人力成本也是资源,而且比机器更重要。 因为,根据摩尔定律,机器越来越便宜。 而根据IT行业的发展趋势,程序员工资越来越贵。 因此,让对方加班才是王道,机器成本并不是特别值钱。

    四、知己知彼:如何编写简单爬虫

    要想做反爬虫,我们首先需要知道如何写个简单的爬虫。 目前网络上搜索到的爬虫资料十分有限,通常都只是给一段python代码。 python是一门很好的语言,但是用来针对有反爬虫措施的站点做爬虫,真的不是最优选择。 更讽刺的是,通常搜到的python爬虫代码都会使用一个lynx的user-agent。 你们应该怎么处理这个user-agent,就不用我来说了吧? 通常编写爬虫需要经过这么几个过程:
  • 分析页面请求格式
  • 创建合适的http请求
  • 批量发送http请求,获取数据
  • 举个例子,直接查看携程生产url。 在详情页点击“确定”按钮,会加载价格。 假设价格是你想要的,那么抓出网络请求之后,哪个请求才是你想要的结果呢? 答案出乎意料的简单,你只需要用根据网络传输数据量进行倒序排列即可。 因为其他的迷惑性的url再多再复杂,开发人员也不会舍得加数据量给他。

    五、知己知彼:如何编写高级爬虫

    那么爬虫进阶应该如何做呢? 通常所谓的进阶有以下几种: 分布式 通常会有一些教材告诉你,为了爬取效率,需要把爬虫分布式部署到多台机器上。 这完全是骗人的。 分布式唯一的作用是:防止对方封IP。 封IP是终极手段,效果非常好,当然,误伤起用户也是非常爽的。 模拟JavaScript 有些教程会说,模拟javascript,抓取动态网页,是进阶技巧。 但是其实这只是个很简单的功能。 因为,如果对方没有反爬虫,你完全可以直接抓ajax本身,而无需关心js怎么处理的。 如果对方有反爬虫,那么javascript必然十分复杂,重点在于分析,而不仅仅是简单的模拟。 换句话说:这应该是基本功。 PhantomJs 这个是一个极端的例子。 这个东西本意是用来做自动测试的,结果因为效果很好,很多人拿来做爬虫。 但是这个东西有个硬伤,就是:效率。 此外PhantomJs也是可以被抓到的,出于多方面原因,这里暂时不讲。  

    六、不同级别爬虫的优缺点

    越是低级的爬虫,越容易被封锁,但是性能好,成本低。 越是高级的爬虫,越难被封锁,但是性能低,成本也越高。 当成本高到一定程度,我们就可以无需再对爬虫进行封锁。 经济学上有个词叫边际效应。 付出成本高到一定程度,收益就不是很多了。 那么如果对双方资源进行对比,我们就会发现,无条件跟对方死磕,是不划算的。 应该有个黄金点,超过这个点,那就让它爬好了。 毕竟我们反爬虫不是为了面子,而是为了商业因素。

    七、如何设计一个反爬虫系统(常规架构)

    有个朋友曾经给过我这样一个架构: 1、对请求进行预处理,便于识别; 2、识别是否是爬虫; 3、针对识别结果,进行适当的处理; 当时我觉得,听起来似乎很有道理,不愧是架构,想法就是和我们不一样。 后来我们真正做起来反应过来不对了。 因为: 如果能识别出爬虫,哪还有那么多废话? 想怎么搞它就怎么搞它。 如果识别不出来爬虫,你对谁做适当处理? 三句话里面有两句是废话,只有一句有用的,而且还没给出具体实施方式。 那么:这种架构(师)有什么用? 因为当前存在一个架构师崇拜问题,所以很多创业小公司以架构师名义招开发。 给出的title都是:初级架构师,架构师本身就是个高级岗位,为什么会有初级架构。 这就相当于:初级将军/初级司令。 最后去了公司,发现十个人,一个CTO,九个架构师,而且可能你自己是初级架构师,其他人还是高级架构师。 不过初级架构师还不算坑爹了,有些小创业公司还招CTO做开发呢。 传统反爬虫手段 1、后台对访问进行统计,如果单个IP访问超过阈值,予以封锁。 这个虽然效果还不错,但是其实有两个缺陷,一个是非常容易误伤普通用户,另一个就是,IP其实不值钱,几十块钱甚至有可能买到几十万个IP。 所以总体来说是比较亏的。 不过针对三月份呢爬虫,这点还是非常有用的。 2、后台对访问进行统计,如果单个session访问超过阈值,予以封锁。 这个看起来更高级了一些,但是其实效果更差,因为session完全不值钱,重新申请一个就可以了。 3、后台对访问进行统计,如果单个userAgent访问超过阈值,予以封锁。 这个是大招,类似于抗生素之类的,效果出奇的好,但是杀伤力过大,误伤非常严重,使用的时候要非常小心。 至今为止我们也就只短暂封杀过mac下的火狐。 4、以上的组合 组合起来能力变大,误伤率下降,在遇到低级爬虫的时候,还是比较好用的。 由以上我们可以看出,其实爬虫反爬虫是个游戏,RMB玩家才最牛逼。 因为上面提到的方法,效果均一般,所以还是用JavaScript比较靠谱。 也许有人会说:javascript做的话,不是可以跳掉前端逻辑,直接拉服务吗? 怎么会靠谱呢? 因为啊,我是一个标题党啊。 JavaScript不仅仅是做前端。 跳过前端不等于跳过JavaScript。 也就是说:我们的服务器是nodejs做的。
    思考题:我们写代码的时候,最怕碰到什么代码? 什么代码不好调试?
    eval eval已经臭名昭著了,它效率低下,可读性糟糕。 正是我们所需要的。 goto js对goto支持并不好,因此需要自己实现goto。 混淆 目前的minify工具通常是minify成abcd之类简单的名字,这不符合我们的要求。 我们可以minify成更好用的,比如阿拉伯语。 为什么呢? 因为阿拉伯语有的时候是从左向右写,有的时候是从右向左写,还有的时候是从下向上写。 除非对方雇个阿拉伯程序员,否则非头疼死不可。 不稳定代码 什么bug不容易修? 不容易重现的bug不好修。 因此,我们的代码要充满不确定性,每次都不一样。 代码演示 下载代码本身,可以更容易理解。 这里简短介绍下思路:
  • 纯JAVASCRIPT反爬虫DEMO,通过更改连接地址,来让对方抓取到错误价格。 这种方法简单,但是如果对方针对性的来查看,十分容易被发现。
  • 纯JAVASCRIPT反爬虫DEMO,更改key。 这种做法简单,不容易被发现。 但是可以通过有意爬取错误价格的方式来实现。
  • 纯JAVASCRIPT反爬虫DEMO,更改动态key。 这种方法可以让更改key的代价变为0,因此代价更低。
  • 纯JAVASCRIPT反爬虫DEMO,十分复杂的更改key。 这种方法,可以让对方很难分析,如果加了后续提到的浏览器检测,更难被爬取。
  • 到此为止。 前面我们提到了边际效应,就是说,可以到此为止了。 后续再投入人力就得不偿失了。 除非有专门的对手与你死磕。 不过这个时候就是为了尊严而战,不是为了商业因素了。 浏览器检测 针对不同的浏览器,我们的检测方式是不一样的。
  • IE 检测bug;
  • FF 检测对标准的严格程度;
  • Chrome 检测强大特性。
  • 八、我抓到你了——然后该怎么办

    不会引发生产事件——直接拦截 可能引发生产事件——给假数据(也叫投毒) 此外还有一些发散性的思路。 例如是不是可以在响应里做SQL注入? 毕竟是对方先动的手。 不过这个问题法务没有给具体回复,也不容易和她解释。 因此暂时只是设想而已。 1、技术压制 我们都知道,DotA AI里有个de命令,当AI被击杀后,它获取经验的倍数会提升。 因此,前期杀AI太多,AI会一身神装,无法击杀。 正确的做法是,压制对方等级,但是不击杀。 反爬虫也是一样的,不要一开始就搞太过分,逼人家和你死磕。 2、心理战 挑衅、怜悯、嘲讽、猥琐。 以上略过不提,大家领会精神即可。 3、放水 这个可能是是最高境界了。 程序员都不容易,做爬虫的尤其不容易。 可怜可怜他们给他们一小口饭吃吧。 没准过几天你就因为反爬虫做得好,改行做爬虫了。

    Python simple message box

    import ctypes # An included library with Python install. ctypes.windll.user32.MessageBoxW(0, "Your text", "Your title", 1) Or define a function (Mbox) like so: import ctypes # An included library with Python install. def Mbox(title, text, style): return ctypes.windll.user32.MessageBoxW(0, text, title, style) Mbox('Your title', 'Your text', 1) Note: edited to use MessageBoxW instead of MessageBoxA Note the styles are as follows: ## Styles: ## 0 : OK ## 1 : OK | Cancel ## 2 : Abort | Retry | Ignore ## 3 : Yes | No | Cancel ## 4 : Yes | No ## 5 : Retry | Cancel ## 6 : Cancel | Try Again | Continue

    Importing EasyGui

    In order to use EasyGui, you must import it. The simplest import statement is: import easygui If you use this form, then to access the EasyGui functions, you must prefix them with the name “easygui”, this way: easygui.msgbox(...) One alternative is to import EasyGui this way: from easygui import * This makes it easier to invoke the EasyGui functions; you won’t have to prefix the function names with “easygui”. You can just code something like this: msgbox(...) A third alternative is to use something like the following import statement: import easygui as g This allows you to keep the EasyGui namespace separate with a minimal amount of typing. You can access easgui functions like this: g.msgbox(...) This third alterative is actually the best way to do it once you get used to python and easygui.

    Using EasyGui

    Once your module has imported EasyGui, GUI operations are a simple a matter of invoking EasyGui functions with a few parameters. For example, using EasyGui, the famous “Hello, world!” program looks like this: from easygui import * msgbox("Hello, world!") To see a demo of what EasyGui output looks like, invoke easyGui from the command line, this way: python easygui.py To see examples of code that invokes the EasyGui functions, look at the demonstration code at the end of easygui.py.

    Default arguments for EasyGui functions

    For all of the boxes, the first two arguments are for message and title, in that order. In some cases, this might not be the most user-friendly arrangement (for example, the dialogs for getting directory and filenames ignore the message argument), but I felt that keeping this consistent across all widgets was a consideration that is more important. Most arguments to EasyGui functions have defaults. Almost all of the boxes display a message and a title. The title defaults to the empty string, and the message usually has a simple default. This makes it is possible to specify as few arguments as you need in order to get the result that you want. For instance, the title argument to msgbox is optional, so you can call msgbox specifying only a message, this way: msgbox("Danger, Will Robinson!") or specifying a message and a title, this way: msgbox("Danger, Will Robinson!", "Warning!") On the various types of buttonbox, the default message is “Shall I continue?”, so you can (if you wish) invoke them without arguments at all. Here we invoke ccbox (the close/cancel box, which returns a boolean value) without any arguments at all: if ccbox(): pass # user chose to continue else: return # user chose to cancel

    Using keyword arguments when calling EasyGui functions

    It is possible to use keyword arguments when calling EasyGui functions. Suppose for instance that you wanted to use a buttonbox, but (for whatever reason) did not want to specify the title (second) positional argument. You could still specify the choices argument (the third argument) using a keyword, this way: choices = ["Yes","No","Only on Friday"] reply = choicebox("Do you like to eat fish?", choices=choices)

    Using buttonboxes

    There are a number of functions built on top of buttonbox() for common needs.

    msgbox

    msgbox displays a message and offers an OK button. You can send whatever message you want, along with whatever title you want. You can even over-ride the default text of “OK” on the button if you wish. Here is the signature of the msgbox function: def msgbox(msg="(Your message goes here)", title=", ok_button="OK"): .... The clearest way to over-ride the button text is to do it with a keyword argument, like this: msgbox("Backup complete!", ok_button="Good job!") Here are a couple of examples: msgbox("Hello, world!") msg = "Do you want to continue?" title = "Please Confirm" if ccbox(msg, title): # show a Continue/Cancel dialog pass # user chose Continue else: # user chose Cancel sys.exit(0)

    ccbox

    ccbox offers a choice of Continue and Cancel, and returns either True (for continue) or False (for cancel).

    ynbox

    ynbox offers a choice of Yes and No, and returns either True of False.

    buttonbox

    To specify your own set of buttons in a buttonbox, use the buttonbox() function. The buttonbox can be used to display a set of buttons of your choice. When the user clicks on a button, buttonbox() returns the text of the choice. If the user cancels or closes the buttonbox, the default choice (the first choice) is returned. buttonbox displays a message, a title, and a set of buttons. Returns the text of the button that the user selected.

    indexbox

    indexbox displays a message, a title, and a set of buttons. Returns the index of the user’s choice. For example, if you invoked index box with three choices (A, B, C), indexbox would return 0 if the user picked A, 1 if he picked B, and 2 if he picked C.

    boolbox

    boolbox (boolean box) displays a message, a title, and a set of buttons. Returns returns 1 if the first button is chosen. Otherwise returns 0. Here is a simple example of a boolbox(): message = "What does she say?" title = " if boolbox(message, title, ["She loves me", "She loves me not"]): sendher("Flowers") # This is just a sample function that you might write. else: pass

    How to show an image in a buttonbox

    When you invoke the buttonbox function (or other functions that display a button box, such as msgbox, indexbox, ynbox,
    etc.), you can specify the keyword argument image=xxx where xxx is the filename of an image. The file can be .gif.
    Usually, you can use other image formats such as .png. Note The types of files supported depends on how you installed python. If other formats don’t work, you may need to install the PIL library. If an image argument is specified, the image file will be displayed after the message. Here is some sample code from EasyGui’s demonstration routine: image = "python_and_check_logo.gif" msg = "Do you like this picture?" choices = ["Yes","No","No opinion"] reply = buttonbox(msg, image=image, choices=choices) If you click on one of the buttons on the bottom, its value will be returned in ‘reply’. You may also click on the image. In that case, the image filename is returned.

    Letting the user select from a list of choices

    choicebox

    Buttonboxes are good for offering the user a small selection of short choices. But if there are many choices, or the text of the choices is long, then a better strategy is to present them as a list. choicebox provides a way for a user to select from a list of choices. The choices are specified in a sequence (a tuple or a list). The choices will be given a case-insensitive sort before they are presented. The keyboard can be used to select an element of the list. Pressing “g” on the keyboard, for example, will jump the selection to the first element beginning with “g”. Pressing “g” again, will jump the cursor to the next element beginning with “g”. At the end of the elements beginning with “g”, pressing “g” again will cause the selection to wrap around to the beginning of the list and jump to the first element beginning with “g”. If there is no element beginning with “g”, then the last element that occurs before the position where “g” would occur is selected. If there is no element before “g”, then the first element in the list is selected: msg ="What is your favorite flavor?" title = "Ice Cream Survey" choices = ["Vanilla", "Chocolate", "Strawberry", "Rocky Road"] choice = choicebox(msg, title, choices) Another example of a choicebox:

    multchoicebox

    The multchoicebox() function provides a way for a user to select from a list of choices. The interface looks just like the choicebox, but the user may select zero, one, or multiple choices. The choices are specified in a sequence (a tuple or a list). The choices will be given a case-insensitive sort before they are presented.

    Letting the user enter information

    enterbox

    enterbox is a simple way of getting a string from the user

    integerbox

    integerbox is a simple way of getting an integer from the user.

    multenterbox

    multenterbox is a simple way of showing multiple enterboxes on a single screen. In the multenterbox:
    Returns a list of the values of the fields, or None if the user cancels the operation. Here is some example code, that shows how values returned from multenterbox can be checked for validity before they are accepted: from __future__ import print_function msg = "Enter your personal information" title = "Credit Card Application" fieldNames = ["Name", "Street Address", "City", "State", "ZipCode"] fieldValues = multenterbox(msg, title, fieldNames) if fieldValues is None: sys.exit(0) # make sure that none of the fields were left blank while 1: errmsg = " for i, name in enumerate(fieldNames): if fieldValues[i].strip() == ": errmsg += "{} is a required field.\n\n".format(name) if errmsg == ": break # no problems found fieldValues = multenterbox(errmsg, title, fieldNames, fieldValues) if fieldValues is None: break print("Reply was:{}".format(fieldValues)) Note The first line ‘from __future__’ is only necessary if you are using Python 2.*, and is only needed for this demo.

    Letting the user enter password information

    passwordbox

    A passwordbox box is like an enterbox, but used for entering passwords. The text is masked as it is typed in.

    multpasswordbox

    multpasswordbox has the same interface as multenterbox, but when it is displayed, the last of the fields is assumed to be a password, and is masked with asterisks.

    Displaying text

    EasyGui provides functions for displaying text.

    textbox

    The textbox() function displays text in a proportional font. The text will word-wrap.

    codebox

    The codebox() function displays text in a monospaced font and does not wrap. Note that you can pass codebox() and textbox() either a string or a list of strings. A list of strings will be converted to text before being displayed. This means that you can use these functions to display the contents of a file this way: import os filename = os.path.normcase("c:/autoexec.bat") f = open(filename, "r") text = f.readlines() f.close() codebox("Contents of file " + filename, "Show File Contents", text)

    Working with files

    A common need is to ask the user for a filename or for a directory. EasyGui provides a few basic functions for allowing a user to navigate through the file system and choose a directory or a file. (These functions are wrappers around widgets and classes in lib-tk.) Note that in the current version of EasyGui, the startpos argument is not supported.

    diropenbox

    diropenbox returns the name of a directory

    fileopenbox

    fileopenbox returns the name of a file

    filesavebox

    filesavebox returns the name of a file

    Remembering User Settings

    EgStore

    A common need is to ask the user for some setting, and then to “persist it”, or store it on disk, so that the next time the user uses your application, you can remember his previous setting. In order to make the process of storing and restoring user settings, EasyGui provides a class called EgStore. In order to remember some settings, your application must define a class (let’s call it Settings, although you can call it anything you want) that inherits from EgStore. Your application must also create an object of that class (let’s call the object settings). The constructor (the __init__ method) of the Settings class can initialize all of the values that you wish to remember. Once you have done this, you can remember the settings simply by assigning values to instance variables in the settings object, and use the settings.store() method to persist the settings object to disk. Here is an example of code using the Settings class: from easygui import EgStore # ----------------------------------------------------------------------- # define a class named Settings as a subclass of EgStore # ----------------------------------------------------------------------- class Settings(EgStore): def __init__(self, filename): # filename is required # ------------------------------------------------- # Specify default/initial values for variables that # this particular application wants to remember. # ------------------------------------------------- self.userId = " self.targetServer = " # ------------------------------------------------- # For subclasses of EgStore, these must be # the last two statements in __init__ # ------------------------------------------------- self.filename = filename # this is required self.restore() # Create the settings object. # If the settingsFile exists, this will restore its values # from the settingsFile. # create "settings", a persistent Settings object # Note that the "filename" argument is required. # The directory for the persistent file must already exist. settingsFilename = "settings.txt" settings = Settings(settingsFilename) # Now use the settings object. # Initialize the "user" and "server" variables # In a real application, we'd probably have the user enter them via enterbox user = "obama_barak" server = "whitehouse1" # Save the variables as attributes of the "settings" object settings.userId = user settings.targetServer = server settings.store() # persist the settings print("\nInitial settings") print settings # Run code that gets a new value for userId # then persist the settings with the new value user = "biden_joe" settings.userId = user settings.store() print("\nSettings after modification") print settings # Delete setting variable del settings.userId print("\nSettings after deletion of userId") print settings Here is an example of code using a dedicated function to create the Settings class: from easygui import read_or_create_settings # Create the settings object. settings = read_or_create_settings('settings1.txt') # Save the variables as attributes of the "settings" object settings.userId = "obama_barak" settings.targetServer = "whitehouse1" settings.store() # persist the settings print("\nInitial settings") print settings # Run code that gets a new value for userId # then persist the settings with the new value user = "biden_joe" settings.userId = user settings.store() print("\nSettings after modification") print settings # Delete setting variable del settings.userId print("\nSettings after deletion of userId") print settings

    Trapping Exceptions

    exceptionbox

    Sometimes exceptions are raised… even in EasyGui applications. Depending on how you run your application, the stack trace might be thrown away, or written to stdout while your application crashes. EasyGui provides a better way of handling exceptions via exceptionbox. Exceptionbox displays the stack trace in a codebox and may allow you to continue processing. Exceptionbox is easy to use. Here is a code example: try: someFunction() # this may raise an exception except: exceptionbox()

    Create a package for Android

    You can create a package for android using the python-for-android project. This page explains how to download and use it directly on your own machine (see Packaging with python-for-android) or use the Buildozer tool to automate the entire process. You can also see Packaging your application for the Kivy Launcher to run kivy programs without compiling them. For new users, we recommend using Buildozer as the easiest way to make a full APK. You can also run your Kivy app without a compilation step with the Kivy Launcher app. Kivy applications can be released on an Android market such as the Play store, with a few extra steps to create a fully signed APK. The Kivy project includes tools for accessing Android APIs to accomplish vibration, sensor access, texting etc. These, along with information on debugging on the device, are documented at the main Android page.

    Buildozer

    Buildozer is a tool that automates the entire build process. It downloads and sets up all the prequisites for python-for-android, including the android SDK and NDK, then builds an apk that can be automatically pushed to the device. Buildozer currently works only in Linux, and is a beta release, but it already works well and can significantly simplify the apk build. You can get buildozer at https://github.com/kivy/buildozer: git clone https://github.com/kivy/buildozer.git cd buildozer sudo python setup.py install This will install buildozer in your system. Afterwards, navigate to your project directory and run: buildozer init This creates a buildozer.spec file controlling your build configuration. You should edit it appropriately with your app name etc. You can set variables to control most or all of the parameters passed to python-for-android. Install buildozer’s dependencies. Finally, plug in your android device and run: buildozer android debug deploy run to build, push and automatically run the apk on your device. Buildozer has many available options and tools to help you, the steps above are just the simplest way to build and run your APK. The full documentation is available here. You can also check the Buildozer README at https://github.com/kivy/buildozer.

    Packaging with python-for-android

    You can also package directly with python-for-android, which can give you more control but requires you to manually download parts of the Android toolchain. See the python-for-android documentation for full details.

    Packaging your application for the Kivy Launcher

    The Kivy launcher is an Android application that runs any Kivy examples stored on your SD Card. To install the Kivy launcher, you must: Go to the Kivy Launcher page on the Google Play Store Click on Install Select your phone… And you’re done! If you don’t have access to the Google Play Store on your phone/tablet, you can download and install the APK manually from http://kivy.org/#download. Once the Kivy launcher is installed, you can put your Kivy applications in the Kivy directory in your external storage directory (often available at /sdcard even in devices where this memory is internal), e.g. /sdcard/kivy/<yourapplication> <yourapplication> should be a directory containing: # Your main application file: main.py # Some info Kivy requires about your app on android: android.txt The file android.txt must contain: title=<Application Title> author=<Your Name> orientation=<portrait|landscape> These options are just a very basic configuration. If you create your own APK using the tools above, you can choose many other settings.

    Installation of Examples

    Kivy comes with many examples, and these can be a great place to start trying the Kivy launcher. You can run them as below: #. Download the `Kivy demos for Android <https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/kivy/kivydemo-for-android.zip>`_ #. Unzip the contents and go to the folder `kivydemo-for-android` #. Copy all the the subfolders here to /sdcard/kivy Run the launcher and select one of the Pictures, Showcase, Touchtracer, Cymunk or other demos…

    Release on the market

    If you have built your own APK with Buildozer or with python-for-android, you can create a release version that may be released on the Play store or other Android markets. To do this, you must run Buildozer with the release parameter (e.g. buildozer android release), or if using python-for-android use the --release option to build.py. This creates a release APK in the bin directory, which you must properly sign and zipalign. The procedure for doing this is described in the Android documentation at https://developer.android.com/studio/publish/app-signing.html#signing-manually - all the necessary tools come with the Android SDK.

    Targeting Android

    Kivy is designed to operate identically across platforms and as a result, makes some clear design decisions. It includes its own set of widgets and by default, builds an APK with all the required core dependencies and libraries. It is possible to target specific Android features, both directly and in a (somewhat) cross-platform way. See the Using Android APIs section of the Kivy on Android documentation for more details.

    Python 一行代码

    在学习Python的过程中,总会发现Python能够轻易的解决许多问题。 一些复杂的任务,甚至可以使用一行Python代码就能搞定。 下面给大家介绍50个有趣的Python一行代码,都很实用。 希望大家能从中找到对自己有帮助的技巧。

    1、字母异位词

    两个单词如果包含相同的字母,次序不同,则称为字母易位词(anagram)。 例如,“silent”和“listen”是字母易位词,而“apple”和“aplee”不是易位词。 from collections import Counter s1 = 'below' s2 = 'elbow' print('anagram') if Counter(s1) == Counter(s2) else print('not an anagram') 使用一行Python代码,就能判断出来了。

    2、二进制转十进制

    decimal = int('1010', 2) print(decimal) #10

    3、将字符串转换为小写

    print("Hi my name is XiaoF".lower()) # 'hi my name is xiaof' print("Hi my name is XiaoF".casefold()) # 'hi my name is xiaof'

    4、将字符串转换为大写

    print("hi my name is XiaoF".upper()) # 'HI MY NAME IS XIAOF'

    5、将字符串转换为字节

    print("convert string to bytes using encode method".encode()) # b'convert string to bytes using encode method'

    6、拷贝文件

    import shutil shutil.copyfile('source.txt', 'dest.txt')

    7、快速排序

    qsort = lambda l: l if len(l) <= 1 else qsort([x for x in l[1:] if x < l[0]]) + [l[0]] + qsort([x for x in l[1:] if x >= l[0]]) print(qsort([17, 29, 11, 97, 103, 5])) # [5, 11, 17, 29, 97, 103]

    8、n个连续数的和

    n = 10 print(sum(range(0, n+1))) # 55

    9、交换两个变量的值

    a,b = b,a

    10、斐波纳契数列

    fib = lambda x: x if x<=1 else fib(x-1) + fib(x-2) print(fib(20)) # 6765

    11、将嵌套列表合并为一个列表

    main_list = [[0, 1, 2], [11, 12, 13], [52, 53, 54]] result = [item for sublist in main_list for item in sublist] print(result) > [0, 1, 2, 11, 12, 13, 52, 53, 54]

    12、运行一个HTTP服务器

    python3 -m http.server 8000 python2 -m SimpleHTTPServer

    13、反转列表

    numbers = [0, 1, 2, 11, 12, 13, 52, 53, 54] print(numbers[::-1]) # [54, 53, 52, 13, 12, 11, 2, 1, 0]

    14、阶乘

    import math fact_5 = math.factorial(5) print(fact_5) # 120

    15、在列表推导式中使用for和if

    even_list = [number for number in [1, 2, 3, 4] if number % 2 == 0] print(even_list) # [2, 4]

    16、列表中最长的字符串

    words = ['This', 'is', 'a', 'list', 'of', 'words'] result = max(words, key=len) print(result) # 'words'

    17、列表推导式

    li = [num for num in range(0, 10)] print(li) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

    18、集合推导式

    num_set = {num for num in range(0, 10)} print(num_set) # {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

    19、字典推导式

    dict_numbers = {x: x*x for x in range(1, 5)} print(dict_numbers) # {1: 1, 2: 4, 3: 9, 4: 16}

    20、if-else

    print("even") if 4 % 2==0 else print("odd")

    21、无限循环

    while 1:0

    22、检查数据类型

    print(isinstance(2, int)) # True print(isinstance("allwin", str)) # True print(isinstance([3, 4, 1997], list)) # True

    23、While循环

    a = 5 while a > 0: a = a - 1 print(a) # 0

    24、使用print语句写入文件

    print("Hello, World!", file=open('file.txt', 'w')) 既可打印出信息,还能将信息保存文件。

    25、计算一个字符在字符串中出现的频率

    print("umbrella".count('l')) # 2

    26、合并列表

    list1 = [1, 2, 4] list2 = ['XiaoF'] list1.extend(list2) print(list1) # [1, 2, 4, 'XiaoF']

    27、合并字典

    dict1 = {'name': 'weiwei', 'age': 23} dict2 = {'city': 'Beijing'} dict1.update(dict2) print(dict1) # {'name': 'weiwei', 'age': 23, 'city': 'Beijing'}

    28、合并集合

    set1 = {0, 1, 2} set2 = {11, 12, 13} set1.update(set2) print(set1) # {0, 1, 2, 11, 12, 13}

    29、时间戳

    import time print(time.time())

    30、列表中出现次数最多的元素

    test_list = [9, 4, 5, 4, 4, 5, 9, 5, 4] most_frequent_element = max(set(test_list), key=test_list.count) print(most_frequent_element) # 4

    31、嵌套列表

    numbers = [[num] for num in range(10)] print(numbers) # [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]

    32、八进制转十进制

    print(int('30', 8)) # 24

    33、将键值对转换为字典

    result = dict(name='XiaoF', age=23) print(result) # {'name': 'XiaoF', 'age': 23}

    34、求商和余数

    quotient, remainder = divmod(4, 5) print(quotient, remainder) # 0 4 divmod()函数返回当参数1除以参数2时,包含商和余数的元组。

    35、删除列表中的重复项

    print(list(set([4, 4, 5, 5, 6]))) # [4, 5, 6]

    36、按升序排序列表

    print(sorted([5, 2, 9, 1])) # [1, 2, 5, 9]

    37、按降序排序列表

    print(sorted([5, 2, 9, 1], reverse=True)) # [9, 5, 2, 1]

    38、获取小写字母表

    import string print(string.ascii_lowercase) # abcdefghijklmnopqrstuvwxyz

    39、获取大写字母表import string

    print(string.ascii_uppercase) # ABCDEFGHIJKLMNOPQRSTUVWXYZ

    40、获取0到9字符串

    import string print(string.digits) # 0123456789

    41、十六进制转十进制

    print(int('da9', 16)) # 3497

    42、日期时间

    import time print(time.ctime()) # Thu Aug 13 20:00:00 2021

    43、将列表中的字符串转换为整数

    print(list(map(int, ['1', '2', '3']))) # [1, 2, 3]

    44、用键对字典进行排序

    d = {'one': 1, 'four': 4, 'eight': 8} result = {key: d[key] for key in sorted(d.keys())} print(result) # {'eight': 8, 'four': 4, 'one': 1}

    45、用键值对字典进行排序

    x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0} result = {k: v for k, v in sorted(x.items(), key=lambda item: item[1])} print(result) # {0: 0, 2: 1, 1: 2, 4: 3, 3: 4}

    46、列表旋转

    li = [1, 2, 3, 4, 5] # li[n:] + li[:n], 右变左 print(li[2:] + li[:2]) # [3, 4, 5, 1, 2] # li[-n:] + li[:-n], 左变右 print(li[-1:] + li[:-1]) # [5, 1, 2, 3, 4]

    47、将字符串中的数字移除

    message = ''.join(list(filter(lambda x: x.isalpha(), 'abc123def4fg56vcg2'))) print(message) # abcdeffgvcg

    48、矩阵变换

    old_list = [[1, 2, 3], [3, 4, 6], [5, 6, 7]] result = list(list(x) for x in zip(*old_list)) print(result) # [[1, 3, 5], [2, 4, 6], [3, 6, 7]]

    49、列表过滤

    result = list(filter(lambda x: x % 2 == 0, [1, 2, 3, 4, 5, 6])) print(result) # [2, 4, 6]

    50、解包

    a, *b, c = [1, 2, 3, 4, 5] print(a) # 1 print(b) # [2, 3, 4] print(c) # 5

    Web Scrapingwith Mechanical Soup

    Web Scraping Databases with Mechanical Soup and SQlite import mechanicalsoup import pandas as pd import sqlite3 # create browser object & open URL browser = mechanicalsoup.StatefulBrowser() browser.open("https://en.wikipedia.org/wiki/Comparison_of_Linux_distributions") # extract all table headers (entire "Distribution" column) th = browser.page.find_all("th", attrs={"class": "table-rh"}) # tidy up and slice off non-table elements distribution = [value.text.replace("\n", "") for value in th] distribution = distribution[:95] # extract table data (the rest of the table) td = browser.page.find_all("td") # tidy up and slice off non-table elements columns = [value.text.replace("\n", "") for value in td] columns = columns[6:1051] column_names = ["Founder", "Maintainer", "Initial_Release_Year", "Current_Stable_Version", "Security_Updates", "Release_Date", "System_Distribution_Commitment", "Forked_From", "Target_Audience", "Cost", "Status"] dictionary = {"Distribution": distribution} # insert column names and their data into a dictionary for idx, key in enumerate(column_names): dictionary[key] = columns[idx:][::11] # convert dictionary to data frame df = pd.DataFrame(data = dictionary) # create new database and cursor connection = sqlite3.connect("linux_distro.db") cursor = connection.cursor() # create database table and insert all data frame rows cursor.execute("create table linux (Distribution, " + ",".join(column_names)+ ")") for i in range(len(df)): cursor.execute("insert into linux values (?,?,?,?,?,?,?,?,?,?,?,?)", df.iloc[i]) # PERMANENTLY save inserted data in "linux_distro.db" connection.commit() connection.close()

    GUI 神器

    transform aommand line applications into GUI TurnPython command line program into a GUI application GUI是一个人机交互的界面,换句话说,它是人类与计算机交互的一种方法。 GUI主要使用窗口,图标和菜单,也可以通过鼠标和键盘进行操作。 GUI库包含部件。 部件是一系列图形控制元素的集合。 在构建GUI程序时,通常使用层叠方式。 众多图形控制元素直接叠加起来。 当使用python编写应用程序时,你就必须使用GUI库来完成。 对于Python GUI库,你可以有很多的选择。 最多的是 Tkinter ,这个 GUI 库比较灵活,可以做出比较复杂的界面。 但是在页面布局和控件使用上比较复杂,想画出一个好看的界面还是要花很多功夫的。 今天介绍一个 GUI 库 —— Gooey ,一行代码就可以快速生成 GUI 应用程序。

    安装

    安装方法毫无新意,跟以前我们安装其他库一样: pip install Gooey

    简单示例

    from gooey import Gooey, GooeyParser @Gooey(program_name="简单的实例") def main(): parser = GooeyParser(description="第一个示例!") parser.add_argument('文件路径', widget="FileChooser") # 文件选择框 parser.add_argument('日期', widget="DateChooser") # 日期选择框 args = parser.parse_args() # 接收界面传递的参数 print(args) if __name__ == '__main__': main() 这里看代码应该大致知道界面有些什么控件: 文件选择框、日期选择框。 运行之后的效果如下: 我们使用语言配置: from gooey import Gooey, GooeyParser @Gooey(program_name="简单的实例", language='chinese') def main(): parser = GooeyParser(description="第一个示例!") parser.add_argument('文件路径', widget="FileChooser") # 文件选择框 parser.add_argument('日期', widget="DateChooser") # 日期选择框 args = parser.parse_args() # 接收界面传递的参数 print(args) if __name__ == '__main__': main() 这里加了一个 language 参数,我们看看运行后效果: show_sidebar=False tabbed_groups=True 这个例子运行的效果如下: 我们的 exe 可执行文件就在 dist 文件夹下。

    总结

    今天给大家介绍了 Gooey 的一些简单使用,个人觉得比 Tkinter 要好用一些,应对一些简单的 GUI 界面,我们使用 Gooey 可以快速生成,而使用 Tkinter 的话,可能就需要耗费一些时间了。 当然,Gooey 还有一些其他的特性,大家有兴趣可以去探索!

    爬虫利器 Playwright

    Scraping the web with Playwright Web Scraping with Playwright Tutorial How to scrape the web with Playwright Playwright 是微软在 2020 年初开源的新一代自动化测试工具,它的功能类似于 Selenium、Pyppeteer 等,都可以驱动浏览器进行各种自动化操作。 它的功能也非常强大,对市面上的主流浏览器都提供了支持,API 功能简洁又强大。 虽然诞生比较晚,但是现在发展得非常火热。 因为 Playwright 是一个类似 Selenium 一样可以支持网页页面渲染的工具,再加上其强大又简洁的 API,Playwright 同时也可以作为网络爬虫的一个爬取利器。 控制台运行结果如下: 百度一下,你就知道 百度一下,你就知道 百度一下,你就知道 通过运行结果我们可以发现,我们非常方便地启动了三种浏览器并完成了自动化操作,并通过几个 API 就完成了截图和数据的获取,整个运行速度是非常快的,者就是 Playwright 最最基本的用法。 当然除了同步模式,Playwright 还提供异步的 API,如果我们项目里面使用了 asyncio,那就应该使用异步模式,写法如下: import asyncio from playwright.async_api import async_playwright async def main(): async with async_playwright() as p: for browser_type in [p.chromium, p.firefox, p.webkit]: browser = await browser_type.launch() page = await browser.new_page() await page.goto('https://www.baidu.com') await page.screenshot(path=f'screenshot-{browser_type.name}.png') print(await page.title()) await browser.close() asyncio.run(main()) 可以看到整个写法和同步模式基本类似,导入的时候使用的是 async_playwright 方法,而不再是 sync_playwright 方法。 写法上添加了 async/await 关键字的使用,最后的运行效果是一样的。 另外我们注意到,这例子中使用了 with as 语句,with 用于上下文对象的管理,它可以返回一个上下文管理器,也就对应一个 PlaywrightContextManager 对象,无论运行期间是否抛出异常,它能够帮助我们自动分配并且释放 Playwright 的资源。

    4. 代码生成

    Playwright 还有一个强大的功能,那就是可以录制我们在浏览器中的操作并将代码自动生成出来,有了这个功能,我们甚至都不用写任何一行代码,这个功能可以通过 playwright 命令行调用 codegen 来实现,我们先来看看 codegen 命令都有什么参数,输入如下命令: playwright codegen --help 结果类似如下: Usage: npx playwright codegen [options] [url] open page and generate code for user actions Options: -o, --output saves the generated script to a file --target language to use, one of javascript, python, python-async, csharp (default: "python") -b, --browser browser to use, one of cr, chromium, ff, firefox, wk, webkit (default: "chromium") --channel Chromium distribution channel, "chrome", "chrome-beta", "msedge-dev", etc --color-scheme emulate preferred color scheme, "light" or "dark" --device emulate device, for example "iPhone 11" --geolocation specify geolocation coordinates, for example "37.819722,-122.478611" --load-storage load context storage state from the file, previously saved with --save-storage --lang specify language / locale, for example "en-GB" --proxy-server specify proxy server, for example "http://myproxy:3128" or "socks5://myproxy:8080" --save-storage save context storage state at the end, for later use with --load-storage --timezone 可以看到这里有几个选项,比如 -o 代表输出的代码文件的名称; --target 代表使用的语言,默认是 python,即会生成同步模式的操作代码,如果传入 python-async 就会生成异步模式的代码; -b 代表的是使用的浏览器,默认是 Chromium,其他还有很多设置,比如 --device 可以模拟使用手机浏览器,比如 iPhone 11,--lang 代表设置浏览器的语言,--timeout 可以设置页面加载超时时间。 好,了解了这些用法,那我们就来尝试启动一个 Firefox 浏览器,然后将操作结果输出到 script.py 文件,命令如下: playwright codegen -o script.py -b firefox 这时候就弹出了一个 Firefox 浏览器,同时右侧会输出一个脚本窗口,实时显示当前操作对应的代码。 我们可以在浏览器中做任何操作,比如打开百度,然后点击输入框并输入 nba,然后再点击搜索按钮,浏览器窗口如下: ,右侧的窗口如图所示: 操作完毕之后,关闭浏览器,Playwright 会生成一个 script.py 文件,内容如下: from playwright.sync_api import sync_playwright def run(playwright): browser = playwright.firefox.launch(headless=False) context = browser.new_context() # Open new page page = context.new_page() # Go to https://www.baidu.com/ page.goto("https://www.baidu.com/") # Click input[name="wd"] page.click("input[name=\"wd\"]") # Fill input[name="wd"] page.fill("input[name=\"wd\"]", "nba") # Click text=百度一下 with page.expect_navigation(): page.click("text=百度一下") context.close() browser.close() with sync_playwright() as playwright: run(playwright) 可以看到这里生成的代码和我们之前写的示例代码几乎差不多,而且也是完全可以运行的,运行之后就可以看到它又可以复现我们刚才所做的操作了。 所以,有了这个功能,我们甚至都不用编写任何代码,只通过简单的可视化点击就能把代码生成出来,可谓是非常方便了! 另外这里有一个值得注意的点,仔细观察下生成的代码,和前面的例子不同的是,这里 new_page 方法并不是直接通过 browser 调用的,而是通过 context 变量调用的,这个 context 又是由 browser 通过调用 new_context 方法生成的。 有读者可能就会问了,这个 context 究竟是做什么的呢? 其实这个 context 变量对应的是一个 BrowserContext 对象,BrowserContext 是一个类似隐身模式的独立上下文环境,其运行资源是单独隔离的,在做一些自动化测试过程中,每个测试用例我们都可以单独创建一个 BrowserContext 对象,这样可以保证每个测试用例之间互不干扰,具体的 API 可以参考https://playwright.dev/python/docs/api/class-browsercontext。

    5. 移动端浏览器支持

    Playwright 另外一个特色功能就是可以支持移动端浏览器的模拟,比如模拟打开 iPhone 12 Pro Max 上的 Safari 浏览器,然后手动设置定位,并打开百度地图并截图。 首先我们可以选定一个经纬度,比如故宫的经纬度是 39.913904, 116.39014,我们可以通过 geolocation 参数传递给 Webkit 浏览器并初始化。 示例代码如下: from playwright.sync_api import sync_playwright with sync_playwright() as p: iphone_12_pro_max = p.devices['iPhone 12 Pro Max'] browser = p.webkit.launch(headless=False) context = browser.new_context( **iphone_12_pro_max, locale='zh-CN', geolocation={'longitude': 116.39014, 'latitude': 39.913904}, permissions=['geolocation'] ) page = context.new_page() page.goto('https://amap.com') page.wait_for_load_state(state='networkidle') page.screenshot(path='location-iphone.png') browser.close() 这里我们先用 PlaywrightContextManager 对象的 devices 属性指定了一台移动设备,这里传入的是手机的型号,比如 iPhone 12 Pro Max,当然也可以传其他名称,比如 iPhone 8,Pixel 2 等。 前面我们已经了解了 BrowserContext 对象,BrowserContext 对象也可以用来模拟移动端浏览器,初始化一些移动设备信息、语言、权限、位置等信息,这里我们就用它来创建了一个移动端 BrowserContext 对象,通过 geolocation 参数传入了经纬度信息,通过 permissions 参数传入了赋予的权限信息,最后将得到的 BrowserContext 对象赋值为 context 变量。 接着我们就可以用 BrowserContext 对象来新建一个页面,还是调用 new_page 方法创建一个新的选项卡,然后跳转到高德地图,并调用了 wait_for_load_state 方法等待页面某个状态完成,这里我们传入的 state 是 networkidle,也就是网络空闲状态。 因为在页面初始化和加载过程中,肯定是伴随有网络请求的,所以加载过程中肯定不算 networkidle 状态,所以这里我们传入 networkidle 就可以标识当前页面和数据加载完成的状态。 加载完成之后,我们再调用 screenshot 方法获取当前页面截图,最后关闭浏览器。 运行下代码,可以发现这里就弹出了一个移动版浏览器,然后加载了高德地图,并定位到了故宫的位置,如图所示: 这就代表选择文本是 Log in 的节点,并点击。

    CSS 选择器

    CSS 选择器之前也介绍过了,比如根据 id 或者 class 筛选: page.click("button") page.click("#nav-bar .contact-us-item") 根据特定的节点属性筛选: page.click("[data-test=login-button]") page.click("[aria-label='Sign in']")

    CSS 选择器 + 文本

    我们还可以使用 CSS 选择器结合文本值进行海选,比较常用的就是 has-text 和 text,前者代表包含指定的字符串,后者代表字符串完全匹配,示例如下: page.click("article:has-text('Playwright')") page.click("#nav-bar :text('Contact us')") 第一个就是选择文本中包含 Playwright 的 article 节点,第二个就是选择 id 为 nav-bar 节点中文本值等于 Contact us 的节点。

    CSS 选择器 + 节点关系

    还可以结合节点关系来筛选节点,比如使用 has 来指定另外一个选择器,示例如下: page.click(".item-description:has(.item-promo-banner)") 比如这里选择的就是选择 class 为 item-description 的节点,且该节点还要包含 class 为 item-promo-banner 的子节点。 另外还有一些相对位置关系,比如 right-of 可以指定位于某个节点右侧的节点,示例如下: page.click("input:right-of(:text('Username'))") 这里选择的就是一个 input 节点,并且该 input 节点要位于文本值为 Username 的节点的右侧。

    XPath

    当然 XPath 也是支持的,不过 xpath 这个关键字需要我们自行制定,示例如下: page.click("xpath=//button") 这里需要在开头指定xpath= 字符串,代表后面是一个 XPath 表达式。 关于更多选择器的用法和最佳实践,可以参考官方文档: https://playwright.dev/python/docs/selectors。

    7. 常用操作方法

    上面我们了解了浏览器的一些初始化设置和基本的操作实例,下面我们再对一些常用的操作 API 进行说明。 常见的一些 API 如点击 click,输入 fill 等操作,这些方法都是属于 Page 对象的,所以所有的方法都从 Page 对象的 API 文档查找就好了,文档地址: https://playwright.dev/python/docs/api/class-page。 下面介绍几个常见的 API 用法。

    事件监听

    Page 对象提供了一个 on 方法,它可以用来监听页面中发生的各个事件,比如 close、console、load、request、response 等等。 比如这里我们可以监听 response 事件,response 事件可以在每次网络请求得到响应的时候触发,我们可以设置对应的回调方法获取到对应 Response 的全部信息,示例如下: from playwright.sync_api import sync_playwright def on_response(response): print(f'Statue {response.status}: {response.url}') with sync_playwright() as p: browser = p.chromium.launch(headless=False) page = browser.new_page() page.on('response', on_response) page.goto('https://spa6.scrape.center/') page.wait_for_load_state('networkidle') browser.close() 这里我们在创建 Page 对象之后,就开始监听 response 事件,同时将回调方法设置为 on_response,on_response 对象接收一个参数,然后把 Response 的状态码和链接都输出出来了。 运行之后,可以看到控制台输出结果如下: Statue 200: https://spa6.scrape.center/ Statue 200: https://spa6.scrape.center/css/app.ea9d802a.css Statue 200: https://spa6.scrape.center/js/app.5ef0d454.js Statue 200: https://spa6.scrape.center/js/chunk-vendors.77daf991.js Statue 200: https://spa6.scrape.center/css/chunk-19c920f8.2a6496e0.css ... Statue 200: https://spa6.scrape.center/css/chunk-19c920f8.2a6496e0.css Statue 200: https://spa6.scrape.center/js/chunk-19c920f8.c3a1129d.js Statue 200: https://spa6.scrape.center/img/logo.a508a8f0.png Statue 200: https://spa6.scrape.center/fonts/element-icons.535877f5.woff Statue 301: https://spa6.scrape.center/api/movie?limit=10&offset=0&token=NGMwMzFhNGEzMTFiMzJkOGE0ZTQ1YjUzMTc2OWNiYTI1Yzk0ZDM3MSwxNjIyOTE4NTE5 Statue 200: https://spa6.scrape.center/api/movie/?limit=10&offset=0&token=NGMwMzFhNGEzMTFiMzJkOGE0ZTQ1YjUzMTc2OWNiYTI1Yzk0ZDM3MSwxNjIyOTE4NTE5 Statue 200: https://p0.meituan.net/movie/da64660f82b98cdc1b8a3804e69609e041108.jpg@464w_644h_1e_1c Statue 200: https://p0.meituan.net/movie/283292171619cdfd5b240c8fd093f1eb255670.jpg@464w_644h_1e_1c .... Statue 200: https://p1.meituan.net/movie/b607fba7513e7f15eab170aac1e1400d878112.jpg@464w_644h_1e_1c “注意: 这里省略了部分重复的内容。 ”可以看到,这里的输出结果其实正好对应浏览器 Network 面板中所有的请求和响应内容,和下图是一一对应的: 这里我们调用了 route 方法,第一个参数通过正则表达式传入了匹配的 URL 路径,这里代表的是任何包含 .png .jpg 的链接,遇到这样的请求,会回调 cancel_request 方法处理,cancel_request 方法可以接收两个参数,一个是 route,代表一个 CallableRoute 对象,另外一个是 request,代表 Request 对象。 这里我们直接调用了 route 的 abort 方法,取消了这次请求,所以最终导致的结果就是图片的加载全部取消了。 观察下运行结果,如图所示: 这里我们使用 route 的 fulfill 方法指定了一个本地文件,就是刚才我们定义的 HTML 文件,运行结果如下: Playwright is a browser automation library very similar to Puppeteer. Both allow you to control a web browser with only a few lines of code. The possibilities are endless. From automating mundane tasks and testing web applications to data mining. With Playwright you can run Firefox and Safari (WebKit), not only Chromium based browsers. It will also save you time, because Playwright automates away repetitive code, such as waiting for buttons to appear in the page.
    You don’t need to be familiar with Playwright, Puppeteer or web scraping to enjoy this tutorial, but knowledge of HTML, CSS and JavaScript is expected.
    In this tutorial you’ll learn how to: Start a browser with Playwright Click buttons and wait for actions Extract data from a website

    The Project

    To showcase the basics of Playwright, we will create a simple scraper that extracts data about GitHub Topics. You’ll be able to select a topic and the scraper will return information about repositories tagged with this topic. The page for JavaScript GitHub Topic We will use Playwright to start a browser, open the GitHub topic page, click the Load more button to display more repositories, and then extract the following information: Owner Name URL Number of stars Description List of repository topics

    Installation

    To use Playwright you’ll need Node.js version higher than 10 and a package manager. We’ll use npm, which comes preinstalled with Node.js. You can confirm their existence on your machine by running: node -v && npm -v If you’re missing either Node.js or NPM, visit the installation tutorial to get started. Now that we know our environment checks out, let’s create a new project and install Playwright. mkdir playwright-scraper && cd playwright-scraper npm init -y npm i playwright
    The first time you install Playwright, it will download browser binaries, so the installation may take a bit longer.

    Building a scraper

    Creating a scraper with Playwright is surprisingly easy, even if you have no previous scraping experience. If you understand JavaScript and CSS, it will be a piece of cake. In your project folder, create a file called scraper.js (or choose any other name) and open it in your favorite code editor. First, we will confirm that Playwright is correctly installed and working by running a simple script. Now run it using your code editor or by executing the following command in your project folder. node scraper.js If you saw a Chromium window open and the GitHub Topics page successfully loaded, congratulations, you just robotized your web browser with Playwright! JavaScript GitHub topic

    Loading more repositories

    When you first open the topic page, the number of displayed repositories is limited to 30. You can load more by clicking the Load more… button at the bottom of the page. There are two things we need to tell Playwright to load more repositories: Click the Load more… button. Wait for the repositories to load. Clicking buttons is extremely easy with Playwright. By prefixing text= to a string you’re looking for, Playwright will find the element that includes this string and click it. It will also wait for the element to appear if it’s not rendered on the page yet. Clicking a button This is a huge improvement over Puppeteer and it makes Playwright lovely to work with. After clicking, we need to wait for the repositories to load. If we didn’t, the scraper could finish before the new repositories show up on the page and we would miss that data. page.waitForFunction() allows you to execute a function inside the browser and wait until the function returns true . Waiting for To find that article.border selector, we used browser Dev Tools, which you can open in most browsers by right-clicking anywhere on the page and selecting Inspect. It means: Select the <article> tag with the border class. Chrome Dev Tools Let’s plug this into our code and do a test run. If you watch the run, you’ll see that the browser first scrolls down and clicks the Load more… button, which changes the text into Loading more. After a second or two, you’ll see the next batch of 30 repositories appear. Great job!

    Extracting data

    Now that we know how to load more repositories, we will extract the data we want. To do this, we’ll use the page.$$eval function. It tells the browser to find certain elements and then execute a JavaScript function with those elements. Extracting data from page It works like this: page.$$evalfinds our repositories and executes the provided function in the browser. We get repoCards which is an Array of all the repo elements. The return value of the function becomes the return value of the page.$$eval call. Thanks to Playwright, you can pull data out of the browser and save them to a variable in Node.js. Magic! If you’re struggling to understand the extraction code itself, be sure to check out this guide on working with CSS selectors and this tutorial on using those selectors to find HTML elements. And here’s the code with extraction included. When you run it, you’ll see 60 repositories with their information printed to the console.

    Conclusion

    In this tutorial we learned how to start a browser with Playwright, and control its actions with some of Playwright’s most useful functions: page.click() to emulate mouse clicks, page.waitForFunction() to wait for things to happen and page.$$eval() to extract data from a browser page. But we’ve only scratched the surface of what’s possible with Playwright. You can log into websites, fill forms, intercept network communication, and most importantly, use almost any browser in existence. Where will you take this project next? How about turning it into a command-line interface (CLI) tool that takes a topic and number of repositories on input and outputs a file with the repositories? You can do it now.

    Python - Command Line Arguments

    Python provides a getopt module that helps you parse command-line options and arguments. $ python test.py arg1 arg2 arg3 The Python sys module provides access to any command-line arguments via the sys.argv. This serves two purposes −
  • sys.argv is the list of command-line arguments.
  • len(sys.argv) is the number of command-line arguments.
  • Here sys.argv[0] is the program ie. script name. Example Consider the following script test.py − #!/usr/bin/python import sys print( 'Number of arguments:', len(sys.argv), 'arguments.') print( 'Argument List:', str(sys.argv)) Now run above script as follows − $ python test.py arg1 arg2 arg3 This produce following result − Number of arguments: 4 arguments. Argument List: ['test.py', 'arg1', 'arg2', 'arg3'] NOTE − As mentioned above, first argument is always script name and it is also being counted in number of arguments.

    Parsing Command-Line Arguments

    Python provided a getopt module that helps you parse command-line options and arguments. This module provides two functions and an exception to enable command line argument parsing. getopt.getopt method This method parses command line options and parameter list. Following is simple syntax for this method − getopt.getopt(args, options, [long_options]) Here is the detail of the parameters −
  • args − This is the argument list to be parsed.
  • options − This is the string of option letters that the script wants to recognize, with options that require an argument should be followed by a colon (:).
  • long_options − This is optional parameter and if specified, must be a list of strings with the names of the long options, which should be supported. Long options, which require an argument should be followed by an equal sign ('='). To accept only long options, options should be an empty string.
  • This method returns value consisting of two elements: the first is a list of (option, value) pairs. The second is the list of program arguments left after the option list was stripped.
  • Each option-and-value pair returned has the option as its first element, prefixed with a hyphen for short options (e.g., '-x') or two hyphens for long options (e.g., '--long-option').
  • Exception getopt.GetoptError

    This is raised when an unrecognized option is found in the argument list or when an option requiring an argument is given none. The argument to the exception is a string indicating the cause of the error. The attributes msg and opt give the error message and related option.

    Example

    Consider we want to pass two file names through command line and we also want to give an option to check the usage of the script. Usage of the script is as follows − usage: test.py -i <inputfile> -o <outputfile> Here is the following script to test.py − #!/usr/bin/python import sys, getopt def main(argv): inputfile = '' outputfile = '' try: opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="]) except getopt.GetoptError: print( 'test.py -i <inputfile> -o <outputfile>') sys.exit(2) for opt, arg in opts: if opt == '-h': print( 'test.py -i <inputfile> -o <outputfile>') sys.exit() elif opt in ("-i", "--ifile"): inputfile = arg elif opt in ("-o", "--ofile"): outputfile = arg print( 'Input file is "', inputfile) print( 'Output file is "', outputfile) Now, run above script as follows − $ test.py -h usage: test.py -i <inputfile> -o <outputfile> $ test.py -i BMP -o usage: test.py -i <inputfile> -o <outputfile> $ test.py -i inputfile Input file is " inputfile Output file is "

    process command line arguments

    import sys print("\n".join(sys.argv)) sys.argv is a list that contains all the arguments passed to the script on the command line. sys.argv[0] is the script name. import sys print(sys.argv[1:]) from argparse import ArgumentParser parser = ArgumentParser() parser.add_argument("-f", "--file", dest="filename", help="write report to FILE", metavar="FILE") parser.add_argument("-q", "--quiet", action="store_false", dest="verbose", default=True, help="don't print status messages to stdout") args = parser.parse_args()

    streamlit

    https://hackernoon.com/how-to-use-streamlit-and-python-to-build-a-data-science-app Use Streamlit and Python to Build a Data Science App https://github.com/streamlit/streamlit streamlit

    Uninstalling/removing Python packages

    PIP uninstall camelot

    Extract text from PDF

    import PyPDF2 pdfFileObj = open('acer aspire 515.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) print(pdfReader.numPages) pageObj = pdfReader.getPage(0) print(pageObj.extractText()) # closing the pdf file object pdfFileObj.close()

    Python 项目

    界面应用
    1、计算器
    2、记事本
    3、登录和注册
    游戏开发
    1、2048
    2、贪吃蛇
    3、俄罗斯方块
    4、连连看


    界面应用

    1、计算器

    1. 案例介绍 本例利用 Python 开发一个可以进行简单的四则运算的图形化计算器,会用到 Tkinter 图形组件进行开发。 主要知识点: Python Tkinter 界面编程; 计算器逻辑运算实现。 本例难度为初级,适合具有 Python 基础和 Tkinter 组件编程知识的用户学习。 2. 设计原理 要制作一个计算器,首先需要知道它由哪些部分组成。 示意如下图所示。 从结构上来说,一个简单的图形界面,需要由界面组件、组件的事件监听器(响应各类事件的逻辑)和具体的事件处理逻辑组成。 界面实现的主要工作是创建各个界面组件对象,对其进行初始化,以及控制各组件之间的层次关系和布局。 3. 示例效果 4. 示例源码 import tkinter import math import tkinter.messagebox class Calculator(object): # 界面布局方法 def __init__(self): # 创建主界面,并且保存到成员属性中 self.root = tkinter.Tk() self.root.minsize(280, 450) self.root.maxsize(280, 470) self.root.title('计算器') # 设置显式面板的变量 self.result = tkinter.StringVar() self.result.set(0) # 设置一个全局变量 运算数字和f符号的列表 self.lists = [] # 添加一个用于判断是否按下运算符号的标志 self.ispresssign = False # 界面布局 self.menus() self.layout() self.root.mainloop() # 计算器菜单界面摆放 def menus(self): # 添加菜单 # 创建总菜单 allmenu = tkinter.Menu(self.root) # 添加子菜单 filemenu = tkinter.Menu(allmenu, tearoff=0) # 添加选项卡 filemenu.add_command( label='标准型(T) Alt+1', command=self.myfunc) filemenu.add_command( label='科学型(S) Alt+2', command=self.myfunc) filemenu.add_command( label='程序员(P) Alt+3', command=self.myfunc) filemenu.add_command(label='统计信息(A) Alt+4', command=self.myfunc) # 添加分割线 filemenu.add_separator() # 添加选项卡 filemenu.add_command(label='历史记录(Y) Ctrl+H', command=self.myfunc) filemenu.add_command(label='数字分组(I)', command=self.myfunc) # 添加分割线 filemenu.add_separator() # 添加选项卡 filemenu.add_command( label='基本(B) Ctrl+F4', command=self.myfunc) filemenu.add_command(label='单位转换(U) Ctrl+U', command=self.myfunc) filemenu.add_command(label='日期计算(D) Ctrl+E', command=self.myfunc) menu1 = tkinter.Menu(filemenu, tearoff=0) menu1.add_command(label='抵押(M)', command=self.myfunc) menu1.add_command(label='汽车租赁(V)', command=self.myfunc) menu1.add_command(label='油耗(mpg)(F)', command=self.myfunc) menu1.add_command(label='油耗(l/100km)(U)', command=self.myfunc) filemenu.add_cascade(label='工作表(W)', menu=menu1) allmenu.add_cascade(label='查看(V)', menu=filemenu) # 添加子菜单2 editmenu = tkinter.Menu(allmenu, tearoff=0) # 添加选项卡 editmenu.add_command(label='复制(C) Ctrl+C', command=self.myfunc) editmenu.add_command(label='粘贴(V) Ctrl+V', command=self.myfunc) # 添加分割线 editmenu.add_separator() # 添加选项卡 menu2 = tkinter.Menu(filemenu, tearoff=0) menu2.add_command(label='复制历史记录(I)', command=self.myfunc) menu2.add_command( label='编辑(E) F2', command=self.myfunc) menu2.add_command(label='取消编辑(N) Esc', command=self.myfunc) menu2.add_command(label='清除(L) Ctrl+Shift+D', command=self.myfunc) editmenu.add_cascade(label='历史记录(H)', menu=menu2) allmenu.add_cascade(label='编辑(E)', menu=editmenu) # 添加子菜单3 helpmenu = tkinter.Menu(allmenu, tearoff=0) # 添加选项卡 helpmenu.add_command(label='查看帮助(V) F1', command=self.myfunc) # 添加分割线 helpmenu.add_separator() # 添加选项卡 helpmenu.add_command(label='关于计算器(A)', command=self.myfunc) allmenu.add_cascade(label='帮助(H)', menu=helpmenu) self.root.config(menu=allmenu) # 计算器主界面摆放 def layout(self): # 显示屏 result = tkinter.StringVar() result.set(0) show_label = tkinter.Label(self.root, bd=3, bg='white', font=( '宋体', 30), anchor='e', textvariable=self.result) show_label.place(x=5, y=20, width=270, height=70) # 功能按钮MC button_mc = tkinter.Button(self.root, text='MC', command=self.wait) button_mc.place(x=5, y=95, width=50, height=50) # 功能按钮MR button_mr = tkinter.Button(self.root, text='MR', command=self.wait) button_mr.place(x=60, y=95, width=50, height=50) # 功能按钮MS button_ms = tkinter.Button(self.root, text='MS', command=self.wait) button_ms.place(x=115, y=95, width=50, height=50) # 功能按钮M+ button_mjia = tkinter.Button(self.root, text='M+', command=self.wait) button_mjia.place(x=170, y=95, width=50, height=50) # 功能按钮M- button_mjian = tkinter.Button(self.root, text='M-', command=self.wait) button_mjian.place(x=225, y=95, width=50, height=50) # 功能按钮← button_zuo = tkinter.Button(self.root, text='←', command=self.dele_one) button_zuo.place(x=5, y=150, width=50, height=50) # 功能按钮CE button_ce = tkinter.Button( self.root, text='CE', command=lambda: self.result.set(0)) button_ce.place(x=60, y=150, width=50, height=50) # 功能按钮C button_c = tkinter.Button(self.root, text='C', command=self.sweeppress) button_c.place(x=115, y=150, width=50, height=50) # 功能按钮± button_zf = tkinter.Button(self.root, text='±', command=self.zf) button_zf.place(x=170, y=150, width=50, height=50) # 功能按钮√ button_kpf = tkinter.Button(self.root, text='√', command=self.kpf) button_kpf.place(x=225, y=150, width=50, height=50) # 数字按钮7 button_7 = tkinter.Button( self.root, text='7', command=lambda: self.pressnum('7')) button_7.place(x=5, y=205, width=50, height=50) # 数字按钮8 button_8 = tkinter.Button( self.root, text='8', command=lambda: self.pressnum('8')) button_8.place(x=60, y=205, width=50, height=50) # 数字按钮9 button_9 = tkinter.Button( self.root, text='9', command=lambda: self.pressnum('9')) button_9.place(x=115, y=205, width=50, height=50) # 功能按钮/ button_division = tkinter.Button( self.root, text='/', command=lambda: self.presscalculate('/')) button_division.place(x=170, y=205, width=50, height=50) # 功能按钮% button_remainder = tkinter.Button( self.root, text='//', command=lambda: self.presscalculate('//')) button_remainder.place(x=225, y=205, width=50, height=50) # 数字按钮4 button_4 = tkinter.Button( self.root, text='4', command=lambda: self.pressnum('4')) button_4.place(x=5, y=260, width=50, height=50) # 数字按钮5 button_5 = tkinter.Button( self.root, text='5', command=lambda: self.pressnum('5')) button_5.place(x=60, y=260, width=50, height=50) # 数字按钮6 button_6 = tkinter.Button( self.root, text='6', command=lambda: self.pressnum('6')) button_6.place(x=115, y=260, width=50, height=50) # 功能按钮* button_multiplication = tkinter.Button( self.root, text='*', command=lambda: self.presscalculate('*')) button_multiplication.place(x=170, y=260, width=50, height=50) # 功能按钮1/x button_reciprocal = tkinter.Button( self.root, text='1/x', command=self.ds) button_reciprocal.place(x=225, y=260, width=50, height=50) # 数字按钮1 button_1 = tkinter.Button( self.root, text='1', command=lambda: self.pressnum('1')) button_1.place(x=5, y=315, width=50, height=50) # 数字按钮2 button_2 = tkinter.Button( self.root, text='2', command=lambda: self.pressnum('2')) button_2.place(x=60, y=315, width=50, height=50) # 数字按钮3 button_3 = tkinter.Button( self.root, text='3', command=lambda: self.pressnum('3')) button_3.place(x=115, y=315, width=50, height=50) # 功能按钮- button_subtraction = tkinter.Button( self.root, text='-', command=lambda: self.presscalculate('-')) button_subtraction.place(x=170, y=315, width=50, height=50) # 功能按钮= button_equal = tkinter.Button( self.root, text='=', command=lambda: self.pressequal()) button_equal.place(x=225, y=315, width=50, height=105) # 数字按钮0 button_0 = tkinter.Button( self.root, text='0', command=lambda: self.pressnum('0')) button_0.place(x=5, y=370, width=105, height=50) # 功能按钮. button_point = tkinter.Button( self.root, text='.', command=lambda: self.pressnum('.')) button_point.place(x=115, y=370, width=50, height=50) # 功能按钮+ button_plus = tkinter.Button( self.root, text='+', command=lambda: self.presscalculate('+')) button_plus.place(x=170, y=370, width=50, height=50) # 计算器菜单功能 def myfunc(self): tkinter.messagebox.showinfo('', '预留接口,学成之后,你是不是有冲动添加该功能.') # 数字方法 def pressnum(self, num): # 全局化变量 # 判断是否按下了运算符号 if self.ispresssign == False: pass else: self.result.set(0) # 重置运算符号的状态 self.ispresssign = False if num == '.': num = '0.' # 获取面板中的原有数字 oldnum = self.result.get() # 判断界面数字是否为0 if oldnum == '0': self.result.set(num) else: # 连接上新按下的数字 newnum = oldnum + num # 将按下的数字写到面板中 self.result.set(newnum) # 运算函数 def presscalculate(self, sign): # 保存已经按下的数字和运算符号 # 获取界面数字 num = self.result.get() self.lists.append(num) # 保存按下的操作符号 self.lists.append(sign) # 设置运算符号为按下状态 self.ispresssign = True # 获取运算结果 def pressequal(self): # 获取所有的列表中的内容(之前的数字和操作) # 获取当前界面上的数字 curnum = self.result.get() # 将当前界面的数字存入列表 self.lists.append(curnum) # 将列表转化为字符串 calculatestr = ''.join(self.lists) # 使用eval执行字符串中的运算即可 endnum = eval(calculatestr) # 将运算结果显示在界面中 self.result.set(str(endnum)[:10]) if self.lists != 0: self.ispresssign = True # 清空运算列表 self.lists.clear() # 暂未开发说明 def wait(self): tkinter.messagebox.showinfo('', '更新中......') # ←按键功能 def dele_one(self): if self.result.get() == '' or self.result.get() == '0': self.result.set('0') return else: num = len(self.result.get()) if num > 1: strnum = self.result.get() strnum = strnum[0:num - 1] self.result.set(strnum) else: self.result.set('0') # ±按键功能 def zf(self): strnum = self.result.get() if strnum[0] == '-': self.result.set(strnum[1:]) elif strnum[0] != '-' and strnum != '0': self.result.set('-' + strnum) # 1/x按键功能 def ds(self): dsnum = 1 / int(self.result.get()) self.result.set(str(dsnum)[:10]) if self.lists != 0: self.ispresssign = True # 清空运算列表 self.lists.clear() # C按键功能 def sweeppress(self): self.lists.clear() self.result.set(0) # √按键功能 def kpf(self): strnum = float(self.result.get()) endnum = math.sqrt(strnum) if str(endnum)[-1] == '0': self.result.set(str(endnum)[:-2]) else: self.result.set(str(endnum)[:10]) if self.lists != 0: self.ispresssign = True # 清空运算列表 self.lists.clear() # 实例化对象 my_calculator = Calculator()

    2、记事本

    1. 案例介绍 tkinter 是 Python下面向 tk 的图形界面接口库,可以方便地进行图形界面设计和交互操作编程。 tkinter 的优点是简单易用、与 Python 的结合度好。 tkinter 在 Python 3.x 下默认集成,不需要额外的安装操作; 不足之处为缺少合适的可视化界面设计工具,需要通过代码来完成窗口设计和元素布局。 本例采用的 Python 版本为 3.8,如果想在 python 2.x下使用 tkinter,请先进行安装。 需要注意的是,不同 Python 版本下的 tkinter 使用方式可能略有不同,建议采用 Python3.x 版本。 本例难度为中级,适合具有 Python 基础和 Tkinter 组件编程知识的用户学习。 2. 示例效果 3. 示例源码 from tkinter import * from tkinter.filedialog import * from tkinter.messagebox import * import os filename = "" def author(): showinfo(title="作者", message="Python") def power(): showinfo(title="版权信息", message="课堂练习") def mynew(): global top, filename, textPad top.title("未命名文件") filename = None textPad.delete(1.0, END) def myopen(): global filename filename = askopenfilename(defaultextension=".txt") if filename == "": filename = None else: top.title("记事本" + os.path.basename(filename)) textPad.delete(1.0, END) f = open(filename, 'r') textPad.insert(1.0, f.read()) f.close() def mysave(): global filename try: f = open(filename, 'w') msg = textPad.get(1.0, 'end') f.write(msg) f.close() except: mysaveas() def mysaveas(): global filename f = asksaveasfilename(initialfile="未命名.txt", defaultextension=".txt") filename = f fh = open(f, 'w') msg = textPad.get(1.0, END) fh.write(msg) fh.close() top.title("记事本 " + os.path.basename(f)) def cut(): global textPad textPad.event_generate("<<Cut>>") def copy(): global textPad textPad.event_generate("<<Copy>>") def paste(): global textPad textPad.event_generate("<<Paste>>") def undo(): global textPad textPad.event_generate("<<Undo>>") def redo(): global textPad textPad.event_generate("<<Redo>>") def select_all(): global textPad # textPad.event_generate("<<Cut>>") textPad.tag_add("sel", "1.0", "end") def find(): t = Toplevel(top) t.title("查找") t.geometry("260x60+200+250") t.transient(top) Label(t, text="查找: ").grid(row=0, column=0, sticky="e") v = StringVar() e = Entry(t, width=20, textvariable=v) e.grid(row=0, column=1, padx=2, pady=2, sticky="we") e.focus_set() c = IntVar() Checkbutton(t, text="不区分大小写", variable=c).grid(row=1, column=1, sticky='e') Button(t, text="查找所有", command=lambda: search(v.get(), c.get(), textPad, t, e)).grid(row=0, column=2, sticky="e" + "w", padx=2, pady=2) def close_search(): textPad.tag_remove("match", "1.0", END) t.destroy() t.protocol("WM_DELETE_WINDOW", close_search) def mypopup(event): # global editmenu editmenu.tk_popup(event.x_root, event.y_root) def search(needle, cssnstv, textPad, t, e): textPad.tag_remove("match", "1.0", END) count = 0 if needle: pos = "1.0" while True: pos = textPad.search(needle, pos, nocase=cssnstv, stopindex=END) if not pos: break lastpos = pos + str(len(needle)) textPad.tag_add("match", pos, lastpos) count += 1 pos = lastpos textPad.tag_config('match', fg='yellow', bg="green") e.focus_set() t.title(str(count) + "个被匹配") top = Tk() top.title("记事本") top.geometry("600x400+100+50") menubar = Menu(top) # 文件功能 filemenu = Menu(top) filemenu.add_command(label="新建", accelerator="Ctrl+N", command=mynew) filemenu.add_command(label="打开", accelerator="Ctrl+O", command=myopen) filemenu.add_command(label="保存", accelerator="Ctrl+S", command=mysave) filemenu.add_command(label="另存为", accelerator="Ctrl+shift+s", command=mysaveas) menubar.add_cascade(label="文件", menu=filemenu) # 编辑功能 editmenu = Menu(top) editmenu.add_command(label="撤销", accelerator="Ctrl+Z", command=undo) editmenu.add_command(label="重做", accelerator="Ctrl+Y", command=redo) editmenu.add_separator() editmenu.add_command(label="剪切", accelerator="Ctrl+X", command=cut) editmenu.add_command(label="复制", accelerator="Ctrl+C", command=copy) editmenu.add_command(label="粘贴", accelerator="Ctrl+V", command=paste) editmenu.add_separator() editmenu.add_command(label="查找", accelerator="Ctrl+F", command=find) editmenu.add_command(label="全选", accelerator="Ctrl+A", command=select_all) menubar.add_cascade(label="编辑", menu=editmenu) # 关于 功能 aboutmenu = Menu(top) aboutmenu.add_command(label="作者", command=author) aboutmenu.add_command(label="版权", command=power) menubar.add_cascade(label="关于", menu=aboutmenu) top['menu'] = menubar # shortcutbar = Frame(top, height=25, bg='light sea green') # shortcutbar.pack(expand=NO, fill=X) # Inlabe = Label(top, width=2, bg='antique white') # Inlabe.pack(side=LEFT, anchor='nw', fill=Y) textPad = Text(top, undo=True) textPad.pack(expand=YES, fill=BOTH) scroll = Scrollbar(textPad) textPad.config(yscrollcommand=scroll.set) scroll.config(command=textPad.yview) scroll.pack(side=RIGHT, fill=Y) # 热键绑定 textPad.bind("<Control-N>", mynew) textPad.bind("<Control-n>", mynew) textPad.bind("<Control-O>", myopen) textPad.bind("<Control-o>", myopen) textPad.bind("<Control-S>", mysave) textPad.bind("<Control-s>", mysave) textPad.bind("<Control-A>", select_all) textPad.bind("<Control-a>", select_all) textPad.bind("<Control-F>", find) textPad.bind("<Control-f>", find) textPad.bind("<Button-3>", mypopup) top.mainloop()

    3、登录和注册

    1. 案例介绍 本例设计一个用户登录和注册模块,使用 Tkinter 框架构建界面,主要用到画布、文本框、按钮等组件。 涉及知识点: Python Tkinter 界面编程、pickle 数据存储。 本例实现了基本的用户登录和注册互动界面,并提供用户信息存储和验证。 pickle 是 python 语言的一个标准模块,安装 python 后已包含 pickle 库,不需要单独再安装。 pickle 模块实现了基本的数据序列化和反序列化。 通过 pickle 模块的序列化操作能够将程序中运行的对象信息保存到文件中去,永久存储; 通过 pickle 模块的反序列化操作,能够从文件中创建上一次程序保存的对象。 本例难度为中级,适合具有 Python 基础和 Tkinter 组件编程知识的用户学习。 2. 示例效果 3. 示例源码 import tkinter as tk import pickle import tkinter.messagebox from PIL import Image, ImageTk # 设置窗口---最开始的母体窗口 window = tk.Tk() # 建立一个窗口 window.title('欢迎登录') window.geometry('450x300') # 窗口大小为300x200 # 画布 canvas = tk.Canvas(window, height=200, width=900) # 加载图片 im = Image.open("images/01.png") image_file = ImageTk.PhotoImage(im) # image_file = tk.PhotoImage(file='images/01.gif') image = canvas.create_image(100, 40, anchor='nw', image=image_file) canvas.pack(side='top') # 两个文字标签,用户名和密码两个部分 tk.Label(window, text='用户名').place(x=100, y=150) tk.Label(window, text='密 码').place(x=100, y=190) var_usr_name = tk.StringVar() # 讲文本框的内容,定义为字符串类型 var_usr_name.set('amoxiang@163.com') # 设置默认值 var_usr_pwd = tk.StringVar() # 第一个输入框-用来输入用户名的。 # textvariable 获取文本框的内容 entry_usr_name = tk.Entry(window, textvariable=var_usr_name) entry_usr_name.place(x=160, y=150) # 第二个输入框-用来输入密码的。 entry_usr_pwd = tk.Entry(window, textvariable=var_usr_pwd, show='*') entry_usr_pwd.place(x=160, y=190) def usr_login(): usr_name = var_usr_name.get() usr_pwd = var_usr_pwd.get() try: with open('usrs_info.pickle', 'rb') as usr_file: usrs_info = pickle.load(usr_file) except FileNotFoundError: with open('usrs_info.pickle', 'wb') as usr_file: usrs_info = {'admin': 'admin'} pickle.dump(usrs_info, usr_file) if usr_name in usrs_info: if usr_pwd == usrs_info[usr_name]: tk.messagebox.showinfo( title='欢迎光临', message=usr_name + ': 请进入个人首页,查看最新资讯') else: tk.messagebox.showinfo(message='错误提示: 密码不对,请重试') else: is_sign_up = tk.messagebox.askyesno('提示', '你还没有注册,请先注册') print(is_sign_up) if is_sign_up: usr_sign_up() # 注册按钮 def usr_sign_up(): def sign_to_Mofan_Python(): np = new_pwd.get() npf = new_pwd_confirm.get() nn = new_name.get() # 上面是获取数据,下面是查看一下是否重复注册过 with open('usrs_info.pickle', 'rb') as usr_file: exist_usr_info = pickle.load(usr_file) if np != npf: tk.messagebox.showerror('错误提示', '密码和确认密码必须一样') elif nn in exist_usr_info: tk.messagebox.showerror('错误提示', '用户名早就注册了! ') else: exist_usr_info[nn] = np with open('usrs_info.pickle', 'wb') as usr_file: pickle.dump(exist_usr_info, usr_file) tk.messagebox.showinfo('欢迎', '你已经成功注册了') window_sign_up.destroy() # 点击注册之后,会弹出这个窗口界面。 window_sign_up = tk.Toplevel(window) window_sign_up.title('欢迎注册') window_sign_up.geometry('360x200') # 中间是x,而不是*号 # 用户名框--这里输入用户名框。 new_name = tk.StringVar() new_name.set('amoxiang@163.com') # 设置的是默认值 tk.Label(window_sign_up, text='用户名').place(x=10, y=10) entry_new_name = tk.Entry(window_sign_up, textvariable=new_name) entry_new_name.place(x=100, y=10) # 新密码框--这里输入注册时候的密码 new_pwd = tk.StringVar() tk.Label(window_sign_up, text='密 码').place(x=10, y=50) entry_usr_pwd = tk.Entry(window_sign_up, textvariable=new_pwd, show='*') entry_usr_pwd.place(x=100, y=50) # 密码确认框 new_pwd_confirm = tk.StringVar() tk.Label(window_sign_up, text='确认密码').place(x=10, y=90) entry_usr_pwd_confirm = tk.Entry( window_sign_up, textvariable=new_pwd_confirm, show='*') entry_usr_pwd_confirm.place(x=100, y=90) btn_confirm_sign_up = tk.Button( window_sign_up, text=' 注 册 ', command=sign_to_Mofan_Python) btn_confirm_sign_up.place(x=120, y=130) # 创建注册和登录按钮 btn_login = tk.Button(window, text=' 登 录 ', command=usr_login) btn_login.place(x=150, y=230) # 用place来处理按钮的位置信息。 btn_sign_up = tk.Button(window, text=' 注 册 ', command=usr_sign_up) btn_sign_up.place(x=250, y=230) window.mainloop()

    游戏开发

    1、2048

    1. 游戏简介 2048 是一款比较流行的数字游戏。 游戏规则: 每次可按上、下、左、右方向键滑动数字,每滑动一次,所有数字都会往滑动方向靠拢,同时在空白位置随机出现一个数字,相同数字在靠拢时会相加。 不断叠加最终拼出 2048 这个数字算成功。 2048 最早于 2014年3月20日发行。 原版 2048 首先在 GitHub 上发布,原作者是 Gabriele Cirulli,后被移植到各个平台。 本例难度为初级,适合具有 Python 基础和 Pygame 编程知识的用户学习。 2. 设计原理 这个游戏的本质是二维列表,就以 4*4 的二位列表来分析关键的逻辑以及实现。 二维列表如下图: 所有的操作都是对这个二维列表的数据的操作。 分为上下左右四个方向。 先说向左的方向(如图)。 向左操作的结果如下图; 当向左的方向是,所有的数据沿着水平方向向左跑。 水平说明操作的是二维列表的一行,而垂直操作的则是二位列表的一列。 这样就可以将二维列表的操作变成遍历后对一维列表的操作。 向左说明数据的优先考虑的位置是从左开始的。 这样就确定了一维列表的遍历开始的位置。 上面第 2 个图共四行,每一个行都能得到一个列表。 list1: [0,0,2,0] list2: [0,4,2,0] list3: [0,0,4,4] list4: [2,0,2,0] 这样一来向左的方向就变成。 从上到下获得每一行的列表,方向向左。 参数(row,left)。 其他的三个方向在开始的时候记住是怎样获得以为列表的,等操作完才放回去这样就能实现了。 3. 示例效果 4. 示例源码 import random import sys import pygame from pygame.locals import * PIXEL = 150 SCORE_PIXEL = 100 SIZE = 4 # 地图的类 class Map: def __init__(self, size): self.size = size self.score = 0 self.map = [[0 for i in range(size)] for i in range(size)] self.add() self.add() # 新增2或4,有1/4概率产生4 def add(self): while True: p = random.randint(0, self.size * self.size - 1) if self.map[int(p / self.size)][int(p % self.size)] == 0: x = random.randint(0, 3) > 0 and 2 or 4 self.map[int(p / self.size)][int(p % self.size)] = x self.score += x break # 地图向左靠拢,其他方向的靠拢可以通过适当旋转实现,返回地图是否更新 def adjust(self): changed = False for a in self.map: b = [] last = 0 for v in a: if v != 0: if v == last: b.append(b.pop() << 1) last = 0 else: b.append(v) last = v b += [0] * (self.size - len(b)) for i in range(self.size): if a[i] != b[i]: changed = True a[:] = b return changed # 逆时针旋转地图90度 def rotate90(self): self.map = [[self.map[c][r] for c in range(self.size)] for r in reversed(range(self.size))] # 判断游戏结束 def over(self): for r in range(self.size): for c in range(self.size): if self.map[r][c] == 0: return False for r in range(self.size): for c in range(self.size - 1): if self.map[r][c] == self.map[r][c + 1]: return False for r in range(self.size - 1): for c in range(self.size): if self.map[r][c] == self.map[r + 1][c]: return False return True def moveUp(self): self.rotate90() if self.adjust(): self.add() self.rotate90() self.rotate90() self.rotate90() def moveRight(self): self.rotate90() self.rotate90() if self.adjust(): self.add() self.rotate90() self.rotate90() def moveDown(self): self.rotate90() self.rotate90() self.rotate90() if self.adjust(): self.add() self.rotate90() def moveLeft(self): if self.adjust(): self.add() # 更新屏幕 def show(map): for i in range(SIZE): for j in range(SIZE): # 背景颜色块 screen.blit(map.map[i][j] == 0 and block[(i + j) % 2] or block[2 + (i + j) % 2], (PIXEL * j, PIXEL * i)) # 数值显示 if map.map[i][j] != 0: map_text = map_font.render( str(map.map[i][j]), True, (106, 90, 205)) text_rect = map_text.get_rect() text_rect.center = (PIXEL * j + PIXEL / 2, PIXEL * i + PIXEL / 2) screen.blit(map_text, text_rect) # 分数显示 screen.blit(score_block, (0, PIXEL * SIZE)) score_text = score_font.render((map.over( ) and "Game over with score " or "Score: ") + str(map.score), True, (106, 90, 205)) score_rect = score_text.get_rect() score_rect.center = (PIXEL * SIZE / 2, PIXEL * SIZE + SCORE_PIXEL / 2) screen.blit(score_text, score_rect) pygame.display.update() map = Map(SIZE) pygame.init() screen = pygame.display.set_mode((PIXEL * SIZE, PIXEL * SIZE + SCORE_PIXEL)) pygame.display.set_caption("2048") block = [pygame.Surface((PIXEL, PIXEL)) for i in range(4)] # 设置颜色 block[0].fill((152, 251, 152)) block[1].fill((240, 255, 255)) block[2].fill((0, 255, 127)) block[3].fill((225, 255, 255)) score_block = pygame.Surface((PIXEL * SIZE, SCORE_PIXEL)) score_block.fill((245, 245, 245)) # 设置字体 map_font = pygame.font.Font(None, int(PIXEL * 2 / 3)) score_font = pygame.font.Font(None, int(SCORE_PIXEL * 2 / 3)) clock = pygame.time.Clock() show(map) while not map.over(): # 12为实验参数 clock.tick(12) for event in pygame.event.get(): if event.type == QUIT: sys.exit() # 接收玩家操作 pressed_keys = pygame.key.get_pressed() if pressed_keys[K_w] or pressed_keys[K_UP]: map.moveUp() elif pressed_keys[K_s] or pressed_keys[K_DOWN]: map.moveDown() elif pressed_keys[K_a] or pressed_keys[K_LEFT]: map.moveLeft() elif pressed_keys[K_d] or pressed_keys[K_RIGHT]: map.moveRight() show(map) # 游戏结束 pygame.time.delay(3000)

    2、贪吃蛇

    1. 案例介绍 贪吃蛇是一款经典的益智游戏,简单又耐玩。 该游戏通过控制蛇头方向吃蛋,从而使得蛇变得越来越长。 通过上下左右方向键控制蛇的方向,寻找吃的东西,每吃一口就能得到一定的积分,而且蛇的身子会越吃越长,身子越长玩的难度就越大,不能碰墙,不能咬到自己的身体,更不能咬自己的尾巴,等到了一定的分数,就能过关,然后继续玩下一关。 本例难度为中级,适合具有 Python 基础和 Pygame 编程知识的用户学习。 2. 设计要点 游戏是基于 PyGame 框架制作的,程序核心逻辑如下: 游戏界面分辨率是 640*480,蛇和食物都是由 1 个或多个 20*20 像素的正方形块儿(为了方便,下文用点表示 20*20 像素的正方形块儿) 组成,这样共有 32*24 个点,使用 pygame.draw.rect 来绘制每一个点; 初始化时蛇的长度是 3,食物是 1 个点,蛇初始的移动的方向是右,用一个数组代表蛇,数组的每个元素是蛇每个点的坐标,因此数组的第一个坐标是蛇尾,最后一个坐标是蛇头; 游戏开始后,根据蛇的当前移动方向,将蛇运动方向的前方的那个点 append 到蛇数组的末位,再把蛇尾去掉,蛇的坐标数组就相当于往前挪了一位; 如果蛇吃到了食物,即蛇头的坐标等于食物的坐标,那么在第 2 点中蛇尾就不用去掉,就产生了蛇长度增加的效果; 食物被吃掉后,随机在空的位置(不能与蛇的身体重合) 再生成一个; 通过 PyGame 的 event 监控按键,改变蛇的方向,例如当蛇向右时,下一次改变方向只能向上或者向下; 当蛇撞上自身或墙壁,游戏结束,蛇头装上自身,那么蛇坐标数组里就有和舌头坐标重复的数据,撞上墙壁则是蛇头坐标超过了边界,都很好判断; 其他细节: 做了个开始的欢迎界面; 食物的颜色随机生成; 吃到实物的时候有声音提示等。 3. 示例效果 4. 示例源码 import pygame from os import path from sys import exit from time import sleep from random import choice from itertools import product from pygame.locals import QUIT, KEYDOWN def direction_check(moving_direction, change_direction): directions = [['up', 'down'], ['left', 'right']] if moving_direction in directions[0] and change_direction in directions[1]: return change_direction elif moving_direction in directions[1] and change_direction in directions[0]: return change_direction return moving_direction class Snake: colors = list(product([0, 64, 128, 192, 255], repeat=3))[1:-1] def __init__(self): self.map = {(x, y): 0 for x in range(32) for y in range(24)} self.body = [[100, 100], [120, 100], [140, 100]] self.head = [140, 100] self.food = [] self.food_color = [] self.moving_direction = 'right' self.speed = 4 self.generate_food() self.game_started = False def check_game_status(self): if self.body.count(self.head) > 1: return True if self.head[0] < 0 or self.head[0] > 620 or self.head[1] < 0 or self.head[1] > 460: return True return False def move_head(self): moves = { 'right': (20, 0), 'up': (0, -20), 'down': (0, 20), 'left': (-20, 0) } step = moves[self.moving_direction] self.head[0] += step[0] self.head[1] += step[1] def generate_food(self): self.speed = len( self.body) // 16 if len(self.body) // 16 > 4 else self.speed for seg in self.body: x, y = seg self.map[x // 20, y // 20] = 1 empty_pos = [pos for pos in self.map.keys() if not self.map[pos]] result = choice(empty_pos) self.food_color = list(choice(self.colors)) self.food = [result[0] * 20, result[1] * 20] def main(): key_direction_dict = { 119: 'up', # W 115: 'down', # S 97: 'left', # A 100: 'right', # D 273: 'up', # UP 274: 'down', # DOWN 276: 'left', # LEFT 275: 'right', # RIGHT } fps_clock = pygame.time.Clock() pygame.init() pygame.mixer.init() snake = Snake() sound = False if path.exists('eat.wav'): sound_wav = pygame.mixer.Sound("eat.wav") sound = True title_font = pygame.font.SysFont('simsunnsimsun', 32) welcome_words = title_font.render( '贪吃蛇', True, (0, 0, 0), (255, 255, 255)) tips_font = pygame.font.SysFont('simsunnsimsun', 20) start_game_words = tips_font.render( '点击开始', True, (0, 0, 0), (255, 255, 255)) close_game_words = tips_font.render( '按ESC退出', True, (0, 0, 0), (255, 255, 255)) gameover_words = title_font.render( '游戏结束', True, (205, 92, 92), (255, 255, 255)) win_words = title_font.render( '蛇很长了,你赢了! ', True, (0, 0, 205), (255, 255, 255)) screen = pygame.display.set_mode((640, 480), 0, 32) pygame.display.set_caption('贪吃蛇') new_direction = snake.moving_direction while 1: for event in pygame.event.get(): if event.type == QUIT: exit() elif event.type == KEYDOWN: if event.key == 27: exit() if snake.game_started and event.key in key_direction_dict: direction = key_direction_dict[event.key] new_direction = direction_check( snake.moving_direction, direction) elif (not snake.game_started) and event.type == pygame.MOUSEBUTTONDOWN: x, y = pygame.mouse.get_pos() if 213 <= x <= 422 and 304 <= y <= 342: snake.game_started = True screen.fill((255, 255, 255)) if snake.game_started: snake.moving_direction = new_direction # 在这里赋值,而不是在event事件的循环中赋值,避免按键太快 snake.move_head() snake.body.append(snake.head[:]) if snake.head == snake.food: if sound: sound_wav.play() snake.generate_food() else: snake.body.pop(0) for seg in snake.body: pygame.draw.rect(screen, [0, 0, 0], [ seg[0], seg[1], 20, 20], 0) pygame.draw.rect(screen, snake.food_color, [ snake.food[0], snake.food[1], 20, 20], 0) if snake.check_game_status(): screen.blit(gameover_words, (241, 310)) pygame.display.update() snake = Snake() new_direction = snake.moving_direction sleep(3) elif len(snake.body) == 512: screen.blit(win_words, (33, 210)) pygame.display.update() snake = Snake() new_direction = snake.moving_direction sleep(3) else: screen.blit(welcome_words, (240, 150)) screen.blit(start_game_words, (246, 310)) screen.blit(close_game_words, (246, 350)) pygame.display.update() fps_clock.tick(snake.speed) if __name__ == '__main__': main()

    3、俄罗斯方块

    1. 案例介绍 俄罗斯方块是由 4 个小方块组成不同形状的板块,随机从屏幕上方落下,按方向键调整板块的位置和方向,在底部拼出完整的一行或几行。 这些完整的横条会消失,给新落下来的板块腾出空间,并获得分数奖励。 没有被消除掉的方块不断堆积,一旦堆到顶端,便告输,游戏结束。 本例难度为高级,适合具有 Python 进阶和 Pygame 编程技巧的用户学习。 2. 设计要点 边框――由 15*25 个空格组成,方块就落在这里面。 盒子――组成方块的其中小方块,是组成方块的基本单元。 方块――从边框顶掉下的东西,游戏者可以翻转和改变位置。 每个方块由 4 个盒子组成。 形状――不同类型的方块。 这里形状的名字被叫做 T, S, Z ,J, L, I , O。 如下图所示: 模版――用一个列表存放形状被翻转后的所有可能样式。 全部存放在变量里,变量名字如 S or J。 着陆――当一个方块到达边框的底部或接触到在其他的盒子话,就说这个方块着陆了。 那样的话,另一个方块就会开始下落。 3. 示例效果 4. 示例源码 import pygame import random import os pygame.init() GRID_WIDTH = 20 GRID_NUM_WIDTH = 15 GRID_NUM_HEIGHT = 25 WIDTH, HEIGHT = GRID_WIDTH * GRID_NUM_WIDTH, GRID_WIDTH * GRID_NUM_HEIGHT SIDE_WIDTH = 200 SCREEN_WIDTH = WIDTH + SIDE_WIDTH WHITE = (0xff, 0xff, 0xff) BLACK = (0, 0, 0) LINE_COLOR = (0x33, 0x33, 0x33) CUBE_COLORS = [ (0xcc, 0x99, 0x99), (0xff, 0xff, 0x99), (0x66, 0x66, 0x99), (0x99, 0x00, 0x66), (0xff, 0xcc, 0x00), (0xcc, 0x00, 0x33), (0xff, 0x00, 0x33), (0x00, 0x66, 0x99), (0xff, 0xff, 0x33), (0x99, 0x00, 0x33), (0xcc, 0xff, 0x66), (0xff, 0x99, 0x00) ] screen = pygame.display.set_mode((SCREEN_WIDTH, HEIGHT)) pygame.display.set_caption("俄罗斯方块") clock = pygame.time.Clock() FPS = 30 score = 0 level = 1 screen_color_matrix = [[None] * GRID_NUM_WIDTH for i in range(GRID_NUM_HEIGHT)] # 设置游戏的根目录为当前文件夹 base_folder = os.path.dirname(__file__) def show_text(surf, text, size, x, y, color=WHITE): font_name = os.path.join(base_folder, 'font/font.ttc') font = pygame.font.Font(font_name, size) text_surface = font.render(text, True, color) text_rect = text_surface.get_rect() text_rect.midtop = (x, y) surf.blit(text_surface, text_rect) class CubeShape(object): SHAPES = ['I', 'J', 'L', 'O', 'S', 'T', 'Z'] I = [[(0, -1), (0, 0), (0, 1), (0, 2)], [(-1, 0), (0, 0), (1, 0), (2, 0)]] J = [[(-2, 0), (-1, 0), (0, 0), (0, -1)], [(-1, 0), (0, 0), (0, 1), (0, 2)], [(0, 1), (0, 0), (1, 0), (2, 0)], [(0, -2), (0, -1), (0, 0), (1, 0)]] L = [[(-2, 0), (-1, 0), (0, 0), (0, 1)], [(1, 0), (0, 0), (0, 1), (0, 2)], [(0, -1), (0, 0), (1, 0), (2, 0)], [(0, -2), (0, -1), (0, 0), (-1, 0)]] O = [[(0, 0), (0, 1), (1, 0), (1, 1)]] S = [[(-1, 0), (0, 0), (0, 1), (1, 1)], [(1, -1), (1, 0), (0, 0), (0, 1)]] T = [[(0, -1), (0, 0), (0, 1), (-1, 0)], [(-1, 0), (0, 0), (1, 0), (0, 1)], [(0, -1), (0, 0), (0, 1), (1, 0)], [(-1, 0), (0, 0), (1, 0), (0, -1)]] Z = [[(0, -1), (0, 0), (1, 0), (1, 1)], [(-1, 0), (0, 0), (0, -1), (1, -1)]] SHAPES_WITH_DIR = { 'I': I, 'J': J, 'L': L, 'O': O, 'S': S, 'T': T, 'Z': Z } def __init__(self): self.shape = self.SHAPES[random.randint(0, len(self.SHAPES) - 1)] # 骨牌所在的行列 self.center = (2, GRID_NUM_WIDTH // 2) self.dir = random.randint(0, len(self.SHAPES_WITH_DIR[self.shape]) - 1) self.color = CUBE_COLORS[random.randint(0, len(CUBE_COLORS) - 1)] def get_all_gridpos(self, center=None): curr_shape = self.SHAPES_WITH_DIR[self.shape][self.dir] if center is None: center = [self.center[0], self.center[1]] return [(cube[0] + center[0], cube[1] + center[1]) for cube in curr_shape] def conflict(self, center): for cube in self.get_all_gridpos(center): # 超出屏幕之外,说明不合法 if cube[0] < 0 or cube[1] < 0 or cube[0] >= GRID_NUM_HEIGHT or \ cube[1] >= GRID_NUM_WIDTH: return True # 不为None,说明之前已经有小方块存在了,也不合法 if screen_color_matrix[cube[0]][cube[1]] is not None: return True return False def rotate(self): new_dir = self.dir + 1 new_dir %= len(self.SHAPES_WITH_DIR[self.shape]) old_dir = self.dir self.dir = new_dir if self.conflict(self.center): self.dir = old_dir return False def down(self): # import pdb; pdb.set_trace() center = (self.center[0] + 1, self.center[1]) if self.conflict(center): return False self.center = center return True def left(self): center = (self.center[0], self.center[1] - 1) if self.conflict(center): return False self.center = center return True def right(self): center = (self.center[0], self.center[1] + 1) if self.conflict(center): return False self.center = center return True def draw(self): for cube in self.get_all_gridpos(): pygame.draw.rect(screen, self.color, (cube[1] * GRID_WIDTH, cube[0] * GRID_WIDTH, GRID_WIDTH, GRID_WIDTH)) pygame.draw.rect(screen, WHITE, (cube[1] * GRID_WIDTH, cube[0] * GRID_WIDTH, GRID_WIDTH, GRID_WIDTH), 1) def draw_grids(): for i in range(GRID_NUM_WIDTH): pygame.draw.line(screen, LINE_COLOR, (i * GRID_WIDTH, 0), (i * GRID_WIDTH, HEIGHT)) for i in range(GRID_NUM_HEIGHT): pygame.draw.line(screen, LINE_COLOR, (0, i * GRID_WIDTH), (WIDTH, i * GRID_WIDTH)) pygame.draw.line(screen, WHITE, (GRID_WIDTH * GRID_NUM_WIDTH, 0), (GRID_WIDTH * GRID_NUM_WIDTH, GRID_WIDTH * GRID_NUM_HEIGHT)) def draw_matrix(): for i, row in zip(range(GRID_NUM_HEIGHT), screen_color_matrix): for j, color in zip(range(GRID_NUM_WIDTH), row): if color is not None: pygame.draw.rect(screen, color, (j * GRID_WIDTH, i * GRID_WIDTH, GRID_WIDTH, GRID_WIDTH)) pygame.draw.rect(screen, WHITE, (j * GRID_WIDTH, i * GRID_WIDTH, GRID_WIDTH, GRID_WIDTH), 2) def draw_score(): show_text(screen, u'得分: {}'.format(score), 20, WIDTH + SIDE_WIDTH // 2, 100) def remove_full_line(): global screen_color_matrix global score global level new_matrix = [[None] * GRID_NUM_WIDTH for i in range(GRID_NUM_HEIGHT)] index = GRID_NUM_HEIGHT - 1 n_full_line = 0 for i in range(GRID_NUM_HEIGHT - 1, -1, -1): is_full = True for j in range(GRID_NUM_WIDTH): if screen_color_matrix[i][j] is None: is_full = False continue if not is_full: new_matrix[index] = screen_color_matrix[i] index -= 1 else: n_full_line += 1 score += n_full_line level = score // 20 + 1 screen_color_matrix = new_matrix def show_welcome(screen): show_text(screen, u'俄罗斯方块', 30, WIDTH / 2, HEIGHT / 2) show_text(screen, u'按任意键开始游戏', 20, WIDTH / 2, HEIGHT / 2 + 50) running = True gameover = True counter = 0 live_cube = None while running: clock.tick(FPS) for event in pygame.event.get(): if event.type == pygame.QUIT: running = False elif event.type == pygame.KEYDOWN: if gameover: gameover = False live_cube = CubeShape() break if event.key == pygame.K_LEFT: live_cube.left() elif event.key == pygame.K_RIGHT: live_cube.right() elif event.key == pygame.K_DOWN: live_cube.down() elif event.key == pygame.K_UP: live_cube.rotate() elif event.key == pygame.K_SPACE: while live_cube.down() == True: pass remove_full_line() # level 是为了方便游戏的难度,level 越高 FPS // level 的值越小 # 这样屏幕刷新的就越快,难度就越大 if gameover is False and counter % (FPS // level) == 0: # down 表示下移骨牌,返回False表示下移不成功,可能超过了屏幕或者和之前固定的 # 小方块冲突了 if live_cube.down() == False: for cube in live_cube.get_all_gridpos(): screen_color_matrix[cube[0]][cube[1]] = live_cube.color live_cube = CubeShape() if live_cube.conflict(live_cube.center): gameover = True score = 0 live_cube = None screen_color_matrix = [[None] * GRID_NUM_WIDTH for i in range(GRID_NUM_HEIGHT)] # 消除满行 remove_full_line() counter += 1 # 更新屏幕 screen.fill(BLACK) draw_grids() draw_matrix() draw_score() if live_cube is not None: live_cube.draw() if gameover: show_welcome(screen) pygame.display.update()

    4、连连看

    1. 案例介绍 连连看是一款曾经非常流行的小游戏。 游戏规则:
    1. 点击选中两个相同的方块。
    2. 两个选中的方块之间连接线的折点不超过两个(接线由X轴和Y轴的平行线组成)。
    3. 每找出一对,它们就会自动消失。
    4. 连线不能从尚未消失的图案上经过。
    5. 把所有的图案全部消除即可获得胜利。
    2. 设计思路
    1. 生成成对的图片元素。
    2. 将图片元素打乱排布。
    3. 定义什么才算 相连(两张图片的连线不多于3跟直线,或者说转角不超过2个)。
    4. 实现 相连 判断算法。
    5. 消除图片元素并判断是否消除完毕。
    3. 示例效果 4. 示例源码 from tkinter import * from tkinter.messagebox import * from threading import Timer import time import random class Point: # 点类 def __init__(self, x, y): self.x = x self.y = y # -------------------------------------- ''' 判断选中的两个方块是否可以消除 ''' def IsLink(p1, p2): if lineCheck(p1, p2): return True if OneCornerLink(p1, p2): # 一个转弯(折点)的联通方式 return True if TwoCornerLink(p1, p2): # 两个转弯(折点)的联通方式 return True return False # --------------------------- def IsSame(p1, p2): if map[p1.x][p1.y] == map[p2.x][p2.y]: print("clicked at IsSame") return True return False def callback(event): # 鼠标左键事件代码 global Select_first, p1, p2 global firstSelectRectId, SecondSelectRectId # print ("clicked at", event.x, event.y,turn) x = (event.x) // 40 # 换算棋盘坐标 y = (event.y) // 40 print("clicked at", x, y) if map[x][y] == " ": showinfo(title="提示", message="此处无方块") else: if Select_first == False: p1 = Point(x, y) # 画选定(x1,y1)处的框线 firstSelectRectId = cv.create_rectangle(x * 40, y * 40, x * 40 + 40, y * 40 + 40, width=2, outline="blue") Select_first = True else: p2 = Point(x, y) # 判断第二次点击的方块是否已被第一次点击选取,如果是则返回。 if (p1.x == p2.x) and (p1.y == p2.y): return # 画选定(x2,y2)处的框线 print('第二次点击的方块', x, y) # SecondSelectRectId=cv.create_rectangle(100,20,x*40+40,y*40+40,width=2,outline="yellow") SecondSelectRectId = cv.create_rectangle(x * 40, y * 40, x * 40 + 40, y * 40 + 40, width=2, outline="yellow") print('第二次点击的方块', SecondSelectRectId) cv.pack() # 判断是否连通 if IsSame(p1, p2) and IsLink(p1, p2): print('连通', x, y) Select_first = False # 画选中方块之间连接线 drawLinkLine(p1, p2) # clearTwoBlock() # time.sleep(0.6) # clearFlag=True t = Timer(timer_interval, delayrun) # 定时函数 t.start() else: # 重新选定第一个方块 # 清除第一个选定框线 cv.delete(firstSelectRectId) cv.delete(SecondSelectRectId) # print('清除第一个选定框线') # firstSelectRectId=SecondSelectRectId # p1=Point(x,y) #设置重新选定第一个方块的坐标 Select_first = False timer_interval = 0.3 # 0.3秒 # -------------------------------------- def delayrun(): clearTwoBlock() # 清除连线及方块 def clearTwoBlock(): # 清除连线及方块 # 延时0.1秒 # time.sleep(0.1) # 清除第一个选定框线 cv.delete(firstSelectRectId) # 清除第2个选定框线 cv.delete(SecondSelectRectId) # 清空记录方块的值 map[p1.x][p1.y] = " " cv.delete(image_map[p1.x][p1.y]) map[p2.x][p2.y] = " " cv.delete(image_map[p2.x][p2.y]) Select_first = False undrawConnectLine() # 清除选中方块之间连接线 def drawQiPan(): # 画棋盘 for i in range(0, 15): cv.create_line(20, 20 + 40 * i, 580, 20 + 40 * i, width=2) for i in range(0, 15): cv.create_line(20 + 40 * i, 20, 20 + 40 * i, 580, width=2) cv.pack() def print_map(): # 输出map地图 global image_map for x in range(0, Width): # 0--14 for y in range(0, Height): # 0--14 if (map[x][y] != ' '): img1 = imgs[int(map[x][y])] id = cv.create_image((x * 40 + 20, y * 40 + 20), image=img1) image_map[x][y] = id cv.pack() for y in range(0, Height): # 0--14 for x in range(0, Width): # 0--14 print(map[x][y], end=' ') print(",", y) ''' * 同行同列情况消除方法 原理: 如果两个相同的被消除元素之间的 空格数 spaceCount等于他们的(行/列差-1)则 两者可以联通消除 * x代表列,y代表行 * param p1 第一个保存上次选中点坐标的点对象 * param p2 第二个保存上次选中点坐标的点对象 ''' # 直接连通 def lineCheck(p1, p2): absDistance = 0 spaceCount = 0 if (p1.x == p2.x or p1.y == p2.y): # 同行同列的情况吗? print("同行同列的情况------") # 同列的情况 if (p1.x == p2.x and p1.y != p2.y): print("同列的情况") # 绝对距离(中间隔着的空格数) absDistance = abs(p1.y - p2.y) - 1 # 正负值 if p1.y - p2.y > 0: zf = -1 else: zf = 1 for i in range(1, absDistance + 1): if (map[p1.x][p1.y + i * zf] == " "): # 空格数加1 spaceCount += 1 else: break; # 遇到阻碍就不用再探测了 # 同行的情况 elif (p1.y == p2.y and p1.x != p2.x): print(" 同行的情况") absDistance = abs(p1.x - p2.x) - 1 # 正负值 if p1.x - p2.x > 0: zf = -1 else: zf = 1 for i in range(1, absDistance + 1): if (map[p1.x + i * zf][p1.y] == " "): # 空格数加1 spaceCount += 1 else: break; # 遇到阻碍就不用再探测了 if (spaceCount == absDistance): # 可联通 print(absDistance, spaceCount) print("行/列可直接联通") return True else: print("行/列不能消除! ") return False else: # 不是同行同列的情况所以直接返回false return False; # -------------------------------------- # 第二种,直角连通 ''' 直角连接,即X,Y坐标都不同的,可以用这个方法尝试连接 param first:选中的第一个点 param second:选中的第二个点 ''' def OneCornerLink(p1, p2): # 第一个直角检查点,如果这里为空则赋予相同值供检查 checkP = Point(p1.x, p2.y) # 第二个直角检查点,如果这里为空则赋予相同值供检查 checkP2 = Point(p2.x, p1.y); # 第一个直角点检测 if (map[checkP.x][checkP.y] == " "): if (lineCheck(p1, checkP) and lineCheck(checkP, p2)): linePointStack.append(checkP) print("直角消除ok", checkP.x, checkP.y) return True # 第二个直角点检测 if (map[checkP2.x][checkP2.y] == " "): if (lineCheck(p1, checkP2) and lineCheck(checkP2, p2)): linePointStack.append(checkP2) print("直角消除ok", checkP2.x, checkP2.y) return True print("不能直角消除") return False; # ----------------------------------------- ''' #第三种,双直角连通 双直角联通判定可分两步走: 1. 在p1点周围4个方向寻找空格checkP 2. 调用OneCornerLink(checkP, p2) 3. 即遍历 p1 4 个方向的空格,使之成为 checkP,然后调用 OneCornerLink(checkP, p2)判定是否为真,如果为真则可以双直角连同,否则当所有的空格都遍历完而没有找 到一个checkP使OneCornerLink(checkP, p2)为真,则两点不能连同 具体代码: 双直角连接方法 @param p1 第一个点 @param p2 第二个点 ''' def TwoCornerLink(p1, p2): checkP = Point(p1.x, p1.y) # 四向探测开始 for i in range(0, 4): checkP.x = p1.x checkP.y = p1.y # 向下 if (i == 3): checkP.y += 1 while ((checkP.y < Height) and map[checkP.x][checkP.y] == " "): linePointStack.append(checkP) if (OneCornerLink(checkP, p2)): print("下探测OK") return True else: linePointStack.pop() checkP.y += 1 print("ssss", checkP.y, Height - 1) # 补充两个折点都在游戏区域底侧外部 if checkP.y == Height: # 出了底部,则仅需判断p2能否也达到底部边界 z = Point(p2.x, Height - 1) # 底部边界点 if lineCheck(z, p2): # 两个折点在区域外部的底侧 linePointStack.append(Point(p1.x, Height)) linePointStack.append(Point(p2.x, Height)) print("下探测到游戏区域外部OK") return True # 向右 elif (i == 2): checkP.x += 1 while ((checkP.x < Width) and map[checkP.x][checkP.y] == " "): linePointStack.append(checkP) if (OneCornerLink(checkP, p2)): print("右探测OK") return True else: linePointStack.pop() checkP.x += 1 # 补充两个折点都在游戏区域右侧外部 if checkP.x == Width: # 出了右侧,则仅需判断p2能否也达到右部边界 z = Point(Width - 1, p2.y) # 右部边界点 if lineCheck(z, p2): # 两个折点在区域外部的底侧 linePointStack.append(Point(Width, p1.y)) linePointStack.append(Point(Width, p2.y)) print("右探测到游戏区域外部OK") return True # 向左 elif (i == 1): checkP.x -= 1 while ((checkP.x >= 0) and map[checkP.x][checkP.y] == " "): linePointStack.append(checkP) if (OneCornerLink(checkP, p2)): print("左探测OK") return True else: linePointStack.pop() checkP.x -= 1 # 向上 elif (i == 0): checkP.y -= 1 while ((checkP.y >= 0) and map[checkP.x][checkP.y] == " "): linePointStack.append(checkP) if (OneCornerLink(checkP, p2)): print("上探测OK") return True else: linePointStack.pop() checkP.y -= 1 # 四个方向都寻完都没找到适合的checkP点 print("两直角连接没找到适合的checkP点") return False; # --------------------------- # 画连接线 def drawLinkLine(p1, p2): if (len(linePointStack) == 0): Line_id.append(drawLine(p1, p2)) else: print(linePointStack, len(linePointStack)) if (len(linePointStack) == 1): z = linePointStack.pop() print("一折连通点z", z.x, z.y) Line_id.append(drawLine(p1, z)) Line_id.append(drawLine(p2, z)) if (len(linePointStack) == 2): z1 = linePointStack.pop() print("2折连通点z1", z1.x, z1.y) Line_id.append(drawLine(p2, z1)) z2 = linePointStack.pop() print("2折连通点z2", z2.x, z2.y) Line_id.append(drawLine(z1, z2)) Line_id.append(drawLine(p1, z2)) # 删除连接线 def undrawConnectLine(): while len(Line_id) > 0: idpop = Line_id.pop() cv.delete(idpop) def drawLine(p1, p2): print("drawLine p1,p2", p1.x, p1.y, p2.x, p2.y) # cv.create_line( 40+20, 40+20,200,200,width=5,fill='red') id = cv.create_line(p1.x * 40 + 20, p1.y * 40 + 20, p2.x * 40 + 20, p2.y * 40 + 20, width=5, fill='red') # cv.pack() return id # -------------------------------------- def create_map(): # 产生map地图 global map # 生成随机地图 # 将所有匹配成对的动物物种放进一个临时的地图中 tmpMap = [] m = (Width) * (Height) // 10 print('m=', m) for x in range(0, m): for i in range(0, 10): # 每种方块有10个 tmpMap.append(x) random.shuffle(tmpMap) for x in range(0, Width): # 0--14 for y in range(0, Height): # 0--14 map[x][y] = tmpMap[x * Height + y] # -------------------------------------- def find2Block(event): # 自动查找 global firstSelectRectId, SecondSelectRectId m_nRoW = Height m_nCol = Width bFound = False; # 第一个方块从地图的0位置开始 for i in range(0, m_nRoW * m_nCol): # 找到则跳出循环 if (bFound): break # 算出对应的虚拟行列位置 x1 = i % m_nCol y1 = i // m_nCol p1 = Point(x1, y1) # 无图案的方块跳过 if (map[x1][y1] == ' '): continue # 第二个方块从前一个方块的后面开始 for j in range(i + 1, m_nRoW * m_nCol): # 算出对应的虚拟行列位置 x2 = j % m_nCol y2 = j // m_nCol p2 = Point(x2, y2) # 第二个方块不为空 且与第一个方块的动物相同 if (map[x2][y2] != ' ' and IsSame(p1, p2)): # 判断是否可以连通 if (IsLink(p1, p2)): bFound = True break # 找到后自动消除 if (bFound): # p1(x1,y1)与p2(x2,y2)连通 print('找到后', p1.x, p1.y, p2.x, p2.y) # 画选定(x1,y1)处的框线 firstSelectRectId = cv.create_rectangle(x1 * 40, y1 * 40, x1 * 40 + 40, y1 * 40 + 40, width=2, outline="red") # 画选定(x2,y2)处的框线 secondSelectRectId = cv.create_rectangle(x2 * 40, y2 * 40, x2 * 40 + 40, y2 * 40 + 40, width=2, outline="red") # t=Timer(timer_interval,delayrun)#定时函数 # t.start() return bFound # 游戏主逻辑 root = Tk() root.title("Python连连看 ") imgs = [PhotoImage(file='images\\bar_0' + str(i) + '.gif') for i in range(0, 10)] # 所有图标图案 Select_first = False # 是否已经选中第一块 firstSelectRectId = -1 # 被选中第一块地图对象 SecondSelectRectId = -1 # 被选中第二块地图对象 clearFlag = False linePointStack = [] Line_id = [] Height = 10 Width = 10 map = [[" " for y in range(Height)] for x in range(Width)] image_map = [[" " for y in range(Height)] for x in range(Width)] cv = Canvas(root, bg='green', width=440, height=440) # drawQiPan( ) cv.bind("<Button-1>", callback) # 鼠标左键事件 cv.bind("<Button-3>", find2Block) # 鼠标右键事件 cv.pack() create_map() # 产生map地图 print_map() # 打印map地图 root.mainloop()

    Python获取全部股票代码信息

    Angryshark128 # 保存股票信息至本地 save_stocks() # 获取全部股票代码及名称 _get_all_stocks() # 获取股票信息 _get_stocks(base_url, stock) # 保存股票信息至本地 def save_stocks(): all_stocks = target_util._get_all_stocks() with open ("stock.csv",'a+') as f: f.write("股票代码,股票名称,市场,分类,类型\n") for stock in all_stocks: f.write("{stock[id]},{stock[name]},{stock[category]},{stock[tag]},{stock[type]}\n".format( stock=stock )) logging.info("全部股票信息写入完成!") if __name__ == "__main__": save_stocks() # 获取全部股票代码及名称 def _get_all_stocks(): base_url = "http://54.push2.eastmoney.com/api/qt/clist/get?pn={page_num}&pz={page_size}&po=1&np=1&fltt=2&invt=2&fid=f3&fs={time_id}&fields=f12,f14" stocks = [ { "category": "A股", "tag": "沪深A股", "type": "股票", "time_id": "m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23" }, { "category": "A股", "tag": "上证A股", "type": "股票", "time_id": "m:1+t:2,m:1+t:23" }, { "category": "A股", "tag": "深证A股", "type": "股票", "time_id": "m:0+t:6,m:0+t:80" }, { "category": "A股", "tag": "新股", "type": "股票", "time_id": "m:0+f:8,m:1+f:8" }, { "category": "A股", "tag": "创业板", "type": "股票", "time_id": "m:0+t:80" }, { "category": "A股", "tag": "科创板", "type": "股票", "time_id": "m:1+t:23" }, { "category": "A股", "tag": "沪股通", "type": "股票", "time_id": "b:BK0707" }, { "category": "A股", "tag": "深股通", "type": "股票", "time_id": "b:BK0804" }, { "category": "B股", "tag": "B股", "type": "股票", "time_id": "m:0+t:7,m:1+t:3" }, { "category": "A-B股", "tag": "上证AB股比价", "type": "股票", "time_id": "m:1+b:BK0498" }, { "category": "A-B股", "tag": "深证AB股比价", "type": "股票", "time_id": "m:0+b:BK0498" }, { "category": "A-B股", "tag": "风险警示板", "type": "股票", "time_id": "m:0+f:4,m:1+f:4" }, { "category": "A-B股", "tag": "两网及退市", "type": "股票", "time_id": "m:0+s:3" }, { "category": "美股", "tag": "美股", "type": "股票", "time_id": "m:105,m:106,m:107" }, { "category": "港股", "tag": "港股", "type": "股票", "time_id": "m:128+t:3,m:128+t:4,m:128+t:1,m:128+t:2" }, { "category": "英股", "tag": "英股", "type": "股票", "time_id": "m:155+t:1,m:155+t:2,m:155+t:3,m:156+t:1,m:156+t:2,m:156+t:5,m:156+t:6,m:156+t:7,m:156+t:8" } ] all_stocks = [] for stock in stocks: all_stocks.extend(_get_stocks(base_url, stock)) logging.warning("全部股票信息共{0}条。".format(len(all_stocks))) return all_stocks # 获取股票信息 def _get_stocks(base_url, stock): max_page_num = 50 page_size = 100 result = [] for page_num in range(1, max_page_num): url = base_url.format(time_id=stock["time_id"], page_num=page_num, page_size=page_size) resp = requests.get(url) if not resp.ok: logging.error("{0}-{1}-{2}请求失败:{3}".format(stock["type"], stock["category"], stock["tag"], url)) resp_json = resp.json() if not resp_json["data"]: logging.warning("当前页无数据,将不再继续请求!") break stocks = resp_json["data"]["diff"] result.extend(list( map(lambda s: {"id": s["f12"].replace(" ", "").replace("'", "_"), "name": s["f14"].replace(" ", "").replace("'", "_"), "category": stock["category"], "tag": stock["tag"], "type": stock["type"]}, stocks))) logging.info("{0}-{1}-{2}信息爬取完成,共{3}条。".format(stock["type"], stock["category"], stock["tag"], len(result))) return result

    YYDS 分析股票数据特征

    数据准备
    特征构造
    描述性统计
    缺失值分析
    特征间相关性分析
    特征值分布
    特征间的关系
    特征重要性
    线性回归系数大小排序
    随机森林特征重要性排序
    RandomizedLasso
    RFE递归特征消除特征排序
    RFECV
    LarsCV
    创建特征排序矩阵
    绘制特征重要性排序图


    本文主要从股市数据变量的特征分布及特征重要性两个角度对数据进行分析。 通过绘制图表等方法分析特征本身对分布状况或特征间相互关系。 通过机器学习模型方法分析出特种重要性排序,选出对结果贡献较大对那几个特征,这对后面建模对模型效果有着不可小觑对效果。

    数据准备

    此处数据获取可参见金融数据准备。 df.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1260 entries, 2015-12-31 to 2020-12-31 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Open 1260 non-null float64 1 High 1260 non-null float64 2 Low 1260 non-null float64 3 Close 1260 non-null float64 4 Adj Close 1260 non-null float64 5 Volume 1260 non-null int64 dtypes: float64(5), int64(1) memory usage: 68.9 KB

    特征构造

    df['H-L'] = df['High'] - df['Low'] df['O-C'] = df['Adj Close'] - df['Open'] df['3day MA'] = df['Adj Close'].shift(1).rolling(window=3).mean() df['10day MA'] = df['Adj Close'].shift(1).rolling(window=10).mean() df['30day MA'] = df['Adj Close'].shift(1).rolling(window=30).mean() df['Std_dev'] = df['Adj Close'].rolling(5).std() df.dtypes

    描述性统计

    df.describe().T

    缺失值分析

    检查缺失值

    df.isnull().sum() Open 0 High 0 Low 0 Close 0 Adj Close 0 Volume 0 H-L 0 O-C 0 3day MA 3 10day MA 10 30day MA 30 Std_dev 4 dtype: int64

    缺失值可视化

    这里使用Series的属性plot直接绘制条形图。 df_missing_count = df.isnull().sum() # -1表示缺失数据 # 另一个不常见的设置画布的方法 plt.rcParams['figure.figsize'] = (15,8) df_missing_count.plot.bar() plt.show() for column in df: print("column nunique NaN") print("{0:15} {1:6d} {2:6}".format( column, df[column].nunique(), (df[column] == -1).sum())) column nunique NaN Open 1082 0 High 1083 0 Low 1025 0 Close 1098 0 Adj Close 1173 0 Volume 1250 0 H-L 357 0 O-C 1237 2 3day MA 1240 0 10day MA 1244 0 30day MA 1230 0 Std_dev 1252 0

    特征间相关性分析

    import seaborn as sns # 一个设置色板的方法 # cmap = sns.diverging_palette(220, 10, as_cmap=True) sns.heatmap(df.iloc[:df.shape[0]].corr() ,annot = True, cmap = 'Blues')

    特征值分布

    直方图

    columns_multi = [x for x in list(df.columns)] df.hist(layout = (3,4), column = columns_multi) # 一种不常用的调整画布大小的方法 fig=plt.gcf() fig.set_size_inches(20,9)

    密度图

    names = columns_multi df.plot(kind='density', subplots=True, layout=(3,4), sharex=False)

    特征间的关系

    函数可视化探索数据特征间的关系 sns.pairplot(df, size=3, diag_kind="kde")

    特征重要性

    通过多种方式对特征重要性进行评估,将每个特征的特征重要的得分取均值,最后以均值大小排序绘制特征重要性排序图,直观查看特征重要性。

    导入相关模块

    from sklearn.feature_selection import RFE,RFECV, f_regression from sklearn.linear_model import (LinearRegression, Ridge, Lasso,LarsCV) from stability_selection import StabilitySelection, RandomizedLasso from sklearn.preprocessing import MinMaxScaler from sklearn.ensemble import RandomForestRegressor from sklearn.ensemble import RandomForestClassifier from sklearn.svm import SVR

    线性回归系数大小排序

    回归系数(regression coefficient)在回归方程中表示自变量 x 对因变量 y 影响大小的参数。 回归系数越大表示 x 对 y 影响越大。

    创建排序函数

    df = df.dropna() Y = df['Adj Close'].values X = df.values colnames = df.columns # 定义字典来存储的排名 ranks = {} # 创建函数,它将特征排名存储到rank字典中 def ranking(ranks, names, order=1): minmax = MinMaxScaler() ranks = minmax.fit_transform(order*np.array([ranks]).T).T[0] ranks = map(lambda x: round(x,2), ranks) res = dict(zip(names, ranks)) return res

    多个回归模型系数排序

    # 使用线性回归 lr = LinearRegression(normalize=True) lr.fit(X,Y) ranks["LinReg"] = ranking(np.abs(lr.coef_), colnames) # 使用 Ridge ridge = Ridge(alpha = 7) ridge.fit(X,Y) ranks['Ridge'] = ranking(np.abs(ridge.coef_), colnames) # 使用 Lasso lasso = Lasso(alpha=.05) lasso.fit(X, Y) ranks["Lasso"] = ranking(np.abs(lasso.coef_), colnames)

    随机森林特征重要性排序

    随机森林得到的特征重要性的原理是我们平时用的较频繁的一种方法,无论是对分类型任务还是连续型任务,都有较好对效果。 在随机森林中某个特征X的重要性的计算方法如下: 对于随机森林中的每一颗决策树, 使用相应的OOB(袋外数据)数据来计算它的袋外数据误差 ,记为errOOB1. 随机地对袋外数据OOB所有样本的特征X加入噪声干扰 (就可以随机的改变样本在特征X处的值), 再次计算它的袋外数据误差 ,记为errOOB2. 假设随机森林中有Ntree 棵树,那么对于特征X的重要性,之所以可以用这个表达式来作为相应特征的重要性的度量值是因为:若给某个特征随机加入噪声之后,袋外的准确率大幅度降低,则说明这个特征对于样本的分类结果影响很大,也就是说它的重要程度比较高。

    连续型特征重要性

    对于连续型任务的特征重要性,可以使用回归模型RandomForestRegressorfeature_importances_属性。 X_1 = dataset[['Open', 'High', 'Low', 'Volume', 'Increase_Decrease','Buy_Sell_on_Open', 'Buy_Sell', 'Returns']] y_1 = dataset['Adj Close'] # 创建决策树分类器对象 clf = RandomForestRegressor(random_state=0, n_jobs=-1) # 训练模型 model = clf.fit(X_1, y_1) # 计算特征重要性 importances = model.feature_importances_ # 按降序排序特性的重要性 indices = np.argsort(importances)[::-1] # 重新排列特性名称,使它们与已排序的特性重要性相匹配 names = [dataset.columns[i] for i in indices] # 创建画布 plt.figure(figsize=(10,6)) # 添加标题 plt.title("Feature Importance") # 添加柱状图 plt.bar(range(X.shape[1]), importances[indices]) # 为x轴添加特征名 plt.xticks(range(X.shape[1]), names, rotation=90)

    分类型特征重要性

    当该任务是分类型,需要用分类型模型时,可以使用RandomForestClassifier中的feature_importances_属性。 X2 = dataset[['Open', 'High', 'Low','Adj Close', 'Volume', 'Buy_Sell_on_Open', 'Buy_Sell', 'Returns']] y2 = dataset['Increase_Decrease'] clf = RandomForestClassifier(random_state=0, n_jobs=-1) model = clf.fit(X2, y2) importances = model.feature_importances_ indices = np.argsort(importances)[::-1] names = [dataset.columns[i] for i in indices] plt.figure(figsize=(10,6)) plt.title("Feature Importance") plt.bar(range(X2.shape[1]), importances[indices]) plt.xticks(range(X2.shape[1]), names, rotation=90) plt.show()

    本案例中使用回归模型

    rf = RandomForestRegressor(n_jobs=-1, n_estimators=50, verbose=3) rf.fit(X,Y) ranks["RF"] = ranking(rf.feature_importances_, colnames); 下面介绍两个顶层特征选择算法,之所以叫做顶层,是因为他们都是建立在基于模型的特征选择方法基础之上的,例如回归和SVM,在不同的子集上建立模型,然后汇总最终确定特征得分。

    RandomizedLasso

    RandomizedLasso的选择稳定性方法排序。 稳定性选择是一种基于二次抽样和选择算法相结合较新的方法,选择算法可以是回归、SVM或其他类似的方法。 它的主要思想是在不同的数据子集和特征子集上运行特征选择算法,不断的重复,最终汇总特征选择结果,比如可以统计某个特征被认为是重要特征的频率(被选为重要特征的次数除以它所在的子集被测试的次数)。 理想情况下,重要特征的得分会接近100%。 稍微弱一点的特征得分会是非0的数,而最无用的特征得分将会接近于0。 lambda_grid = np.linspace(0.001, 0.5, num=100) rlasso = RandomizedLasso(alpha=0.04) selector = StabilitySelection(base_estimator=rlasso, lambda_name='alpha', lambda_grid=lambda_grid, threshold=0.9, verbose=1) selector.fit(X, Y) # 运行随机Lasso的选择稳定性方法 ranks["rlasso/Stability"] = ranking(np.abs(selector.stability_scores_.max(axis=1)), colnames) print('finished') {'Open': 1.0, 'High': 1.0, 'Low': 0.76, 'Close': 1.0, 'Adj Close': 0.99, 'Volume': 0.0, 'H-L': 0.0, 'O-C': 1.0, '3day MA': 1.0, '10day MA': 0.27, '30day MA': 0.75, 'Std_dev': 0.0} finished

    稳定性得分可视化

    fig, ax = plot_stability_path(selector) fig.set_size_inches(15,6) fig.show()

    查看得分超过阈值的变量索引及其得分

    # 获取所选特征的掩码或整数索引 selected_variables = selector.get_support(indices=True) selected_scores = selector.stability_scores_.max(axis=1) print('Selected variables are:') print('-----------------------') for idx, (variable, score) in enumerate( zip(selected_variables, selected_scores[selected_variables])): print('Variable %d: [%d], score %.3f' % (idx + 1, variable, score)) Selected variables are: ----------------------- Variable 1: [0], score 1.000 Variable 2: [1], score 1.000 Variable 3: [3], score 1.000 Variable 4: [4], score 0.990 Variable 5: [7], score 1.000 Variable 6: [8], score 1.000

    RFE递归特征消除特征排序

    基于递归特征消除的特征排序。 给定一个给特征赋权的外部评估器(如线性模型的系数),递归特征消除(RFE)的目标是通过递归地考虑越来越小的特征集来选择特征。 主要思想是反复的构建模型(如SVM或者回归模型)然后选出最好的(或者最差的)的特征(可以根据系数来选)。 首先,在初始特征集上训练评估器,并通过任何特定属性或可调用属性来获得每个特征的重要性。 然后,从当前的特征集合中剔除最不重要的特征。 这个过程在训练集上递归地重复,直到最终达到需要选择的特征数。 这个过程中特征被消除的次序就是特征的排序。 因此,这是一种寻找最优特征子集的贪心算法。 RFE的稳定性很大程度上取决于在迭代的时候底层用哪种模型。 例如,假如RFE采用的普通的回归,没有经过正则化的回归是不稳定的,那么RFE就是不稳定的;假如采用的是Ridge,而用Ridge正则化的回归是稳定的,那么RFE就是稳定的。 sklearn.feature_selection.RFE(estimator, *, n_features_to_select=None, step=1, verbose=0, importance_getter='auto') estimator Estimator instance 一种带有""拟合""方法的监督学评估器,它提供关于特征重要性的信息(例如"coef_"、"feature_importances_")。 n_features_to_select int or float, default=None 要选择的功能的数量。 如果'None',则选择一半的特性。 如果为整数,则该参数为要选择的特征的绝对数量。 如果浮点数在0和1之间,则表示要选择的特征的分数。 step int or float, default=1 如果大于或等于1,那么'step'对应于每次迭代要删除的(整数)特征数。 如果在(0.0,1.0)范围内,则'step'对应于每次迭代中要删除的特性的百分比(向下舍入)。 verbose int, default=0 控制输出的冗长。 importance_getter str or callable, default='auto' 如果是'auto',则通过估计器的'coef_'或'feature_importances_'属性使用特征重要性。 lr = LinearRegression(normalize=True) lr.fit(X,Y) # 当且仅当剩下最后一个特性时停止搜索 rfe = RFE(lr, n_features_to_select=1, verbose =3) rfe.fit(X,Y) ranks["RFE"] = ranking(list(map(float, rfe.ranking_)), colnames, order=-1) Fitting estimator with 12 features. ... Fitting estimator with 2 features.

    RFECV

    递归特征消除交叉验证。 Sklearn提供了 RFE 包,可以用于特征消除,还提供了 RFECV ,可以通过交叉验证来对的特征进行排序。 # 实例化估计器和特征选择器 svr_mod = SVR(kernel="linear") rfecv = RFECV(svr_mod, cv=5) # 训练模型 rfecv.fit(X, Y) ranks["RFECV"] = ranking(list(map(float, rfecv.ranking_)), colnames, order=-1) # Print support and ranking print(rfecv.support_) print(rfecv.ranking_) print(X.columns)

    LarsCV

    最小角度回归模型(Least Angle Regression)交叉验证。 # 删除第二步中不重要的特征 # X = X.drop('sex', axis=1) # 实例化 larscv = LarsCV(cv=5, normalize=False) # 训练模型 larscv.fit(X, Y) ranks["LarsCV"] = ranking(list(map(float, larscv.ranking_)), colnames, order=-1) # 输出r方和估计alpha值 print(larscv.score(X, Y)) print(larscv.alpha_) 以上是两个交叉验证,在对特征重要性要求高时可以使用。 因运行时间有点长,这里大家可以自行运行得到结果。

    创建特征排序矩阵

    创建一个空字典来存储所有分数,并求其平均值。 r = {} for name in colnames: r[name] = round(np.mean([ranks[method][name] for method in ranks.keys()]), 2) methods = sorted(ranks.keys()) ranks["Mean"] = r methods.append("Mean") print("\t%s" % "\t".join(methods)) for name in colnames: print("%s\t%s" % (name, "\t".join(map(str, [ranks[method][name] for method in methods])))) Lasso LinReg RF RFE Ridge rlasso/Stability Mean Open 1.0 1.0 0.02 0.91 0.47 1.0 0.73 High 0.14 0.0 0.1 0.36 0.06 1.0 0.28 Low 0.02 0.0 0.08 0.73 0.05 0.76 0.27 Close 0.14 0.0 0.64 0.55 0.32 1.0 0.44 Adj Close 0.02 1.0 1.0 0.82 1.0 0.99 0.8 Volume 0.0 0.0 0.0 0.0 0.0 0.0 0.0 H-L 0.0 0.0 0.0 0.45 0.01 0.0 0.08 O-C 0.85 1.0 0.0 1.0 0.53 1.0 0.73 3day MA 0.0 0.0 0.0 0.27 0.01 1.0 0.21 10day MA 0.0 0.0 0.02 0.09 0.0 0.27 0.06 30day MA 0.0 0.0 0.0 0.18 0.0 0.75 0.16 Std_dev 0.0 0.0 0.0 0.64 0.01 0.0 0.11

    绘制特征重要性排序图

    将平均得到创建DataFrame数据框,从高到低排序,并利用可视化方法将结果展示出。 这样就一目了然,每个特征重要性大小。 meanplot = pd.DataFrame(list(r.items()), columns= ['Feature','Mean Ranking']) # 排序 meanplot = meanplot.sort_values('Mean Ranking', ascending=False) g=sns.factorplot(x="Mean Ranking", y="Feature", data = meanplot, kind="bar", size=14, aspect=1.9, palette='coolwarm') 推荐阅读 1、一文详解 RNN 股票预测实战(Python代码) 2、关于“数据分析”如何快速入门一些基本思路 3、超级攻略!Pandas\NumPy\Matrix 用于金融数据准备

    Handy Automation Scripts

    Your Friend For Reading Articles
    One-Click Sketching
    Stay Up With Top Headlines
    Stocks Updates On The Start
    Bulk Email Sender
    No Time For EDA
    Smart Login To Different Sites
    Be Safe & Watermark Your Images
    Remember That
    Google Scraper
    Converting PDF To Audio Files
    Playing Random Music From The List
    No BookMarks Anymore
    Getting Wikipedia Information
    Smart Weather Information
    Sending Emails With Attachment
    Shorting URLs
    Downloading Youtube Videos
    Cleaning Download Folder
    Sending Text Messages
    Converting hours to seconds
    Raising a number to the power
    If/else statement
    Convert images to JPEG
    Download Google images
    Read battery level of Bluetooth device
    Delete Telegram messages
    Get song lyrics
    Heroku hosting
    Github activity
    Removing duplicate code
    Sending emails
    Find specific files on your system
    Generating random passwords
    Print odd numbers
    Get date value
    Removing items from a list
    Count list items
    Text grabber
    Tweet search


    Automation scripts you need to try.

    Your Friend For Reading Articles

    This automation script scrapes the article content from medium and then reads it loud and clear. If you change the script a little bit then it can be used to read articles from other websites too. I use this script when I am not in the mood to read but to listen. Libraries:- Beautiful Soup is a Python package for parsing HTML and XML documents. requests Let’s You Establish a Connection Between Client and Server With Just One Line of Code. Pyttsx3, converts text into speech, with control over rate, frequency, and voice. import pyttsx3 import requests from bs4 import BeautifulSoup engine = pyttsx3.init('sapi5') voices = engine.getProperty('voices') newVoiceRate = 130 ## Reduce The Speech Rate engine.setProperty('rate',newVoiceRate) engine.setProperty('voice', voices[1].id) def speak(audio): engine.say(audio) engine.runAndWait() text = str(input("Paste article\n")) res = requests.get(text) soup = BeautifulSoup(res.text,'html.parser') articles = [] for i in range(len(soup.select('.p'))): article = soup.select('.p')[i].getText().strip() articles.append(article) text = " ".join(articles) speak(text) # engine.save_to_file(text, 'test.mp3') ## If you want to save the speech as a audio file engine.runAndWait()

    Script Applications:-

    AudioBooks Read Wikipedia Articles Q&A Bots

    One-Click Sketching

    I just love this script. It lets you convert your amazing images into a pencil sketch with a few lines of code. You can use this script to impress someone by gifting them their pencil sketch. Libraries:- Opencv, is a python library that is designed to solve Computer Vision problems. It has many inbuilt methods to perform the biggest tasks in fewer lines of code. """ Photo Sketching Using Python """ import cv2 img = cv2.imread("elon.jpg") ## Image to Gray Image gray_image = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) ## Gray Image to Inverted Gray Image inverted_gray_image = 255-gray_image ## Blurring The Inverted Gray Image blurred_inverted_gray_image = cv2.GaussianBlur(inverted_gray_image, (19,19),0) ## Inverting the blurred image inverted_blurred_image = 255-blurred_inverted_gray_image ### Preparing Photo sketching sketck = cv2.divide(gray_image, inverted_blurred_image,scale= 256.0) cv2.imshow("Original Image",img) cv2.imshow("Pencil Sketch", sketck) cv2.waitKey(0) Result — Image By Author

    Script Applications:-

    Building OCR Software Detecting Number Plate Detecting Edges, Creating Funky Images

    Stay Up With Top Headlines

    Everyone wants to stay up to date with the latest and trending news of your country. This automation script can do the work for you. It uses an external API to extract all the trending news of your country, state, city, etc. This script increases productivity and knowledge. The external API that is used in the script is news API by google. It offers the latest and trending news, different articles about a particular topic like tesla, business headlines, articles published by a journal, trending news between a timeline, etc. Libraries:- Pyttsx3 is a text-to-speech Library In Python. & Requests. import pyttsx3 import requests engine = pyttsx3.init('sapi5') voices = engine.getProperty('voices') engine.setProperty('voice', voices[0].id) def speak(audio): engine.say(audio) engine.runAndWait() def trndnews(): url = " http://newsapi.org/v2/top-headlines?country=us&apiKey=GET_YOUR_OWN" page = requests.get(url).json() article = page["articles"] results = [] for ar in article: results.append(ar["title"]) for i in range(len(results)): print(i + 1, results[i]) speak(results) trndnews()

    Script Applications:-

    ML Fake News Detection.

    Stocks Updates On The Start

    Buying and selling stocks is one of the trendiest ways of earning money nowadays. A stock known as equity represents the ownership of a fraction of a corporation. This automation script will give you the stock price of stock whenever you open your desktop. Also with the same script, you can generate past years' data of the stock for better knowledge of the stock. To Run This Script On The Start, You Can Simply Add it to the window startup folder. Just Press win+r and then type shell:startup paste your script there. Libraries:- Pyfinance, yahoo_fin ''' Live price of The Stock ''' from yahoo_fin import stock_info live_price = stock_info.get_live_price("TSLA") print(round(live_price,2)," USD") ''' Stock Price From 2019 to 2021 ''' import yfinance as yf stockSymbol = 'TSLA' stockData = yf.Ticker(stockSymbol) stockDf_past_2 = stockData.history(period='5d', start='2019-1-1', end='2021-12-31') print(stockDf_past_2)

    Script Applications:-

    This Script Can Be Used For Creating Algo Trading Bots, Stock Analysis, Researches, etc.

    Bulk Email Sender

    In My Previous Article About Automation Scripts, I talked about how you can automate sending emails with attachments. This automation script is a level up to that script. It allows you to send multiple emails at a time with the same or different data, and messages. Libraries:- Email, is a python library that is used to manage emails. Smtlib, defines a session object over which we can send emails and files. Pandas, Reading the CSV or Excel file. import smtplib from email.message import EmailMessage import pandas as pd def send_email(remail, rsubject, rcontent): email = EmailMessage() ## Creating a object for EmailMessage email['from'] = 'The Pythoneer Here' ## Person who is sending email['to'] = remail ## Whom we are sending email['subject'] = rsubject ## Subject of email email.set_content(rcontent) ## content of email with smtplib.SMTP(host='smtp.gmail.com',port=587)as smtp: smtp.ehlo() ## server object smtp.starttls() ## used to send data between server and client smtp.login(SENDER_EMAIL,SENDER_PSWRD) ## login id and password of gmail smtp.send_message(email) ## Sending email print("email send to ",remail) ## Printing success message if __name__ == '__main__': df = pd.read_excel('list.xlsx') length = len(df)+1 for index, item in df.iterrows(): email = item[0] subject = item[1] content = item[2] send_email(email,subject,content)

    Script Applications:-

    Can Be Used For Sending Newsletters. Stay Connected With All Your Clients. Become a Genuine Medium Member With The Cost of One Pizza. It’s Just 5$ a month. You Can Use My Referral Link To Become One. “Don’t Just Read, Support The Writer Too”

    No Time For EDA

    Eda(exploratory data analysis) refers to the initial investigation done to understand the data more clearly. It is one of the most important stages of the data science project lifecycle. It is also referred to as the decision-making stage because, by the output analysis of this stage model, algorithms, parameters, weights everything is chosen. Anyone who knows a little bit about data science will agree with me that EDA is a time-consuming process. Well, not anymore. This automation script used an amazing library Dtale and generate a quick summary report of the data given to it with just one line of code. There are also many similar libraries that can also generate a quick summary like Dtale for example Autoviz, Sweetviz, etc. import seaborn as sns ### Printing Inbuilt Datasets of Seaborn Library print(sns.get_dataset_names()) ### Loading Titanic Dataset df=sns.load_dataset('titanic') ### Importing The Library import dtale #### Generating Quick Summary dtale.show(df)

    Script Applications:-

    Gives a Quick Review About The Dataset. Best for beginners.

    Smart Login To Different Sites

    To prevent yourself from hackers you should always log out from your social media account like Facebook, Twitter, Instagram, etc. Once you are done with your session. Entering use id and password each time is not very joyful work to do. This automation script will log in to different sites for you and once you are done the session is closed automatically. Libraries:- Selenium is an open-source web automation tool used for testing and automation. from selenium import webdriver from selenium.webdriver.common.keys import Keys import time PATH = 'chromedriver.exe' ##Same Directory as Python Program driver = webdriver.Chrome(executable_path=PATH) ##### Login Functions def login_fb(fid,fpsd): driver.get("https://www.facebook.com/") def login(id,password): email = driver.find_element_by_id("email") email.send_keys(id) Password = driver.find_element_by_id("pass") Password.send_keys(password) button = driver.find_element_by_id("u_0_d_Dw").click() pass login(fid,fpsd) ### Like Facebook Write Login Function For Other Platforms Too. def login_insta(): pass def login_medium(): pass def login_twitter(): pass def login_linkedin(): pass login_fb("YOUR_LOGIN_ID", "YOUR_PASSWORD") login_insta() login_medium() login_twitter() login_linkedin() Related Article This Automation Script Saves Time, and Increase Productivity.

    Be Safe & Watermark Your Images

    Internet is filled with digital thieves, who always look for other people’s work to use it as their own without giving proper attribution. Images are one of the most stoled properties on the internet. You clicked a masterpiece, upload it on the internet to showcase it to the world and some thief come and stole it and published it with their own name. To prevent this you should always watermark all images with your unique sign. This automation script will do the work for you. Libraries:- Opencv Process:- We are basically overlaying one image (watermark) on top of another image (original image) with center coordinates. with little changes and a loop, you can watermark hundreds of images in minutes. import cv2 watermark = cv2.imread("watermark.png") img = cv2.imread("no-problem.jpg") h_img, w_img, _ = img.shape center_x = int(w_img/2) center_y = int(h_img/2) h_watermark, w_watermark, _ = watermark.shape top_y = center_y - int(h_watermark/2) left_x = center_x - int(w_watermark/2) bottom_y = top_y + h_watermark right_x = left_x + w_watermark position = img[top_y:bottom_y, left_x:right_x] result = cv2.addWeighted(position, 1, watermark, 0.5, 0) img[top_y:bottom_y, left_x:right_x] = result cv2.imwrite("watermarked_image.jpg", img) cv2.imshow("Image With Watermark", img) cv2.waitKey(0) cv2.destroyAllWindows()

    Script Applications:-

    Overlaying Two Images. Image Filtering & Masking.

    Remember That

    Sometimes when working on a project you get disturbed by some other task that also needs to be done the same day and most of the time you forgot it. Now anymore, this script will remember everything for you and remind you about it after a certain time as a desktop notification. Libraries:- win10toast is python library that sends a desktop notification. from win10toast import ToastNotifier import time toaster = ToastNotifier() header = input("What You Want Me To Remember\n") text = input("Releated Message\n") time_min=float(input("In how many minutes?\n")) time_min = time_min * 60 print("Setting up reminder..") time.sleep(2) print("all set!") time.sleep(time_min) toaster.show_toast(f"{header}", f"{text}", duration=10, threaded=True) while toaster.notification_active(): time.sleep(0.005)

    Google Scraper

    Google is one of the biggest and most used search engines. There are over 3.8 million searches done per minute around the globe. Most of them are just queries that get answered on the first result page. This script will scrape the results from google search and generate answers without even going to actual google. Libraries:- requests, BeautifulSoup, and Tkinter a GUI library in python. Process:- At first with the help of Tkinter, a GUI is created that is used to take the query of the user. Once the user entered the query it is sent to the google scraper function that scrapes the results based on the query and generates the answers. Then with the help of the showinfo class in Tkinter, the results are shown as a pop-up notification. from tkinter import * from tkinter.messagebox import showinfo from bs4 import BeautifulSoup import requests headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} def action(): ### Code For Receiving Query query=textF.get() textF.delete(0,END) print(query) def google(query): query = query.replace(" ","+") try: url = f'https://www.google.com/search?q={query}&oq={query}&aqs=chrome..69i57j46j69i59j35i39j0j46j0l2.4948j0j7&sourceid=chrome&ie=UTF-8' res = requests.get(url,headers=headers) soup = BeautifulSoup(res.text,'html.parser') except: print("Make sure you have a internet connection") try: try: ans = soup.select('.RqBzHd')[0].getText().strip() except: try: title=soup.select('.AZCkJd')[0].getText().strip() try: ans=soup.select('.e24Kjd')[0].getText().strip() except: ans="" ans=f'{title}\n{ans}' except: try: ans=soup.select('.hgKElc')[0].getText().strip() except: ans=soup.select('.kno-rdesc span')[0].getText().strip() except: ans = "can't find on google" return ans result = google(str(query)) showinfo(title="Result For Your Query", message=result) main = Tk() main.geometry("300x100") main.title("Karl") top = Frame(main) top.pack(side=TOP) textF = Entry(main,font=("helvetica",14,"bold")) textF.focus() textF.pack(fill=X,pady=5) textF.insert(0,"Enter your query") textF.configure(state=DISABLED) def on_click(event): textF.configure(state=NORMAL) textF.delete(0,END) textF.unbind('<Button-1>',on_click_id) on_click_id = textF.bind('<Button-1>',on_click) btn = Button(main,text="Search",font=("Verdana",16),command=action) btn.pack() main.mainloop()

    Converting PDF To Audio Files

    This automation task is one of my favorites. I use it almost every day. Here our task is to write a python script that can convert pdfs into audio files. Libraries:- PyPDF, is a library in python that is used to read text from a pdf file. Pyttsx3, is a text-to-speech convert library. Process:- We first use the PyPDF library to read text from the pdf file and then we convert the text into speech and save it as an audio file. import pyttsx3,PyPDF2 pdfreader = PyPDF2.PdfFileReader(open('story.pdf','rb')) speaker = pyttsx3.init() for page_num in range(pdfreader.numPages): text = pdfreader.getPage(page_num).extractText() ## extracting text from the PDF cleaned_text = text.strip().replace('\n',' ') ## Removes unnecessary spaces and break lines print(cleaned_text) ## Print the text from PDF #speaker.say(cleaned_text) ## Let The Speaker Speak The Text speaker.save_to_file(cleaned_text,'story.mp3') ## Saving Text In a audio file 'story.mp3' speaker.runAndWait() speaker.stop()

    Script Applications:-

    Audiobooks. Storyteller. By Adding Little Bit of Web Scraping, The Same Script Can Be Used To Read Articles From Sites Like Medium and WordPress.

    Playing Random Music From The List

    I have a good collection of songs that I love to listen to while working on my projects. For a music lover like me, this script is very useful. It randomly picks a song from a folder of songs. Libraries:- OS, is a module in python that deals with different tasks related to operating systems Like Opening, deleting, renaming, closing a file, etc. random, module provides randomness. Process:- At First With The Help of the OS Module We Detect All The Music Files Inside The Folder and store them in a list, then we generate a random number in the range of length of the folder. After Generating the random number we use it to run the music file using os.startfile() function. music_dir = 'G:\\new english songs' songs = os.listdir(music_dir) song = random.randint(0,len(songs)) print(songs[song]) ## Prints The Song Name os.startfile(os.path.join(music_dir, songs[0]))

    Script Features:-

    Playing Music, Videos. Can Be Used To Run Random Files Inside a Folder.

    No BookMarks Anymore

    Every day before going to bed i search the internet to find some good content to read the next day. Most of the time i bookmark the website or article i came across but day by day my bookmarks have increased so much that now i have over 100+ bookmarks around my browsers. So i figured out a different way to tackle this problem with the help of python. Now i copy-paste the link to those websites in a text file and every morning i run my script that opens all those websites again in my browser. Libraries:- webbrowser, is a library in python that opens URLs inside the default browser automatically. Process:- The Process is pretty simple, the script reads different URLs from the files and opens each URL in the browser with the help of a web browser's library.

    Getting Wikipedia Information

    Wikipedia is a great source of knowledge and information. This script lets you fetch every information from Wikipedia directly from your command line. Libraries:- Wikipedia is a python library that makes parsing data from Wikipedia super easy. Working:- The Script Will Takes a Query, Parse The Results From Wikipedia For It and Then Speaks The Results Out Loud. import wikipedia import pyttsx3 engine = pyttsx3.init('sapi5') voices = engine.getProperty('voices') engine.setProperty('voice', voices[0].id) def speak(audio): engine.say(audio) engine.runAndWait() query = input("What You Want To Ask ??") results = wikipedia.summary(query, sentences=2) speak("According to Wikipedia\n") print(results) speak(results)

    Smart Weather Information

    No one wants to get stuck in the rain or heavy snowfall. Everyone wants to be updated with the weather forecast. This automation script will send weather information as a desktop notification whenever you opened your pc. Libraries:- requests, the library that makes sending HTTP requests simpler and more human-friendly with one single line of code it can establish a connection between client and target server. Beautiful Soup is a Python package for parsing HTML and XML documents. ToastNotifier, a python library that sends a desktop notification. headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} def weather(city): city=city.replace(" ","+") res = requests.get(f'https://www.google.com/search?q={city}&oq={city}&aqs=chrome.0.35i39l2j0l4j46j69i60.6128j1j7&sourceid=chrome&ie=UTF-8',headers=headers) soup = BeautifulSoup(res.text,'html.parser') location = soup.select('#wob_loc')[0].getText().strip() current_time = soup.select('#wob_dts')[0].getText().strip() info = soup.select('#wob_dc')[0].getText().strip() weather = soup.select('#wob_tm')[0].getText().strip() information = f"{location} \n {current_time} \n {info} \n {weather} °C " toaster = ToastNotifier() toaster.show_toast("Weather Information", f"{information}", duration=10, threaded=True) while toaster.notification_active(): time.sleep(0.005) # print("enter the city name") # city=input() city = "London" city=city+" weather" weather(city) Understand The Code Better

    Sending Emails With Attachment

    As a freelancer every day, I need to send multiple emails that look almost the same with little difference. This script helps us to send multiple emails at the same time with different names and content. Libraries:- Email, is a python library that is used to manage emails. Smtlib, defines a session object over which we can send emails and files. import smtplib from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText from email.mime.base import MIMEBase from email import encoders body = ''' Hello, Admin I am attaching The Sales Files With This Email. This Year We Got a Wooping 200% Profit One Our Sales. Regards, Team Sales xyz.com ''' #Sender Email addresses and password senders_email = 'deltadelta371@gmail.com' sender_password = 'delta@371' reveiver_email = 'parasharabhay13@gmail.com' #MIME Setup message = MIMEMultipart() message['From'] = senders_email message['To'] = reveiver_email message['Subject'] = 'Sales Report 2021-- Team Sales' message.attach(MIMEText(body, 'plain')) ## File attach_file_name = 'car-sales.csv' attach_file = open(attach_file_name, 'rb') payload = MIMEBase('application', 'octate-stream') payload.set_payload((attach_file).read()) encoders.encode_base64(payload) payload.add_header('Content-Decomposition', 'attachment', filename=attach_file_name) message.attach(payload) #SMTP Connection For Sending Email session = smtplib.SMTP('smtp.gmail.com', 587) #use gmail with port session.starttls() #enable security session.login(senders_email, sender_password) #login with mail_id and password text = message.as_string() session.sendmail(senders_email, reveiver_email, text) session.quit() print('Mail Sent')

    Shorting URLs

    Sometimes those big URLs become very annoying to read and share. This script uses an external API to short the URL. from __future__ import with_statement import contextlib try: from urllib.parse import urlencode except ImportError: from urllib import urlencode try: from urllib.request import urlopen except ImportError: from urllib2 import urlopen import sys def make_tiny(url): request_url = ('http://tinyurl.com/api-create.php?' + urlencode({'url':url})) with contextlib.closing(urlopen(request_url)) as response: return response.read().decode('utf-8') def main(): for tinyurl in map(make_tiny, sys.argv[1:]): print(tinyurl) if __name__ == '__main__': main() ''' -----------------------------OUTPUT------------------------ python url_shortener.py https://www.wikipedia.org/ https://tinyurl.com/buf3qt3 '''

    Downloading Youtube Videos

    I use youtube for 2–3 hours every day sometimes even more. Most of my learnings come from youtube because it is free and contains a vast amount of information. There are certain videos that stand out from others that I want to store with me to watch later even when I don’t have an internet connection. This script does the job for me, by downloading the youtube video for me. It uses an external API to do the job. Libraries:- pytube, is a lightweight Python library for downloading youtube videos. Tkinter, is one of the most famous and useful GUI Development Library That Makes It Super Easy to Create Awesome GUIs With Fewer Efforts. Why Tkinter:- The Whole Concept of the script is to create an interface through which you can download youtube videos by just putting a link. That Interface can’t be our CLI so we are going to create a simple GUI for our script. You Can make it even better by running your python code without a console with just one click.
    Complete GUI Code from pytube import YouTube import pytube try: video_url = 'https://www.youtube.com/watch?v=lTTajzrSkCw' youtube = pytube.YouTube(video_url) video = youtube.streams.first() video.download('C:/Users/abhay/Desktop/') print("Download Successfull !!") except: print("Something Went Wrong !!")

    Cleaning Download Folder

    One of the messiest things in this world is the download folder of a developer. When writing a blog, working on a project, something similar we just download images and save them with ugly and funny names like asdfg.jpg. This python script will clean your download folder by renaming and deleting certain files based on some condition. Libraries:- OS import os folder_location = 'C:\\Users\\user\\Downloads\\demo' os.chdir(folder_location) list_of_files = os.listdir() ## Selecting All Images images = [content for content in list_of_files if content.endswith(('.png','.jpg','.jpeg'))] for index, image in enumerate(images): os.rename(image,f'{index}.png') ## Deleting All Images ################## Write Your Script Here ######## Try To Create Your Own Code

    Sending Text Messages

    There are many free text message services available on the internet like Twillo, fast2sms, etc. Fast2sms provide 50 free messages with a prebuild template to connect your script with their API. This script will let us send text SMS to any number directly through our command-line interface. import requests import json def send_sms(number, message): url = 'https://www.fast2sms.com/dev/bulk' params = { 'authorization': 'FIND_YOUR_OWN', 'sender_id': 'FSTSMS', 'message': message, 'language': 'english', 'route': 'p', 'numbers': number } response = requests.get(url, params=params) dic = response.json() #print(dic) return dic.get('return') num = int(input("Enter The Number:\n")) msg = input("Enter The Message You Want To Send:\n") s = send_sms(num, msg) if s: print("Successfully sent") else: print("Something went wrong..")

    Converting hours to seconds

    When working on projects that require you to convert hours into seconds, you can use the following Python script. def convert(seconds): seconds = seconds % (24 * 3600) hour = seconds // 3600 seconds %= 3600 minutes = seconds // 60 seconds %= 60 return "%d:%02d:%02d" % (hour, minutes, seconds) # Driver program n = 12345 print(convert(n))

    Raising a number to the power

    Another popular Python script calculates the power of a number. For example, 2 to the power of 4. Here, there are at least three methods to choose from. You can use the math.pow(), pow(), or **. Here is the script. import math # Assign values to x and n x = 4 n = 3 # Method 1 power = x ** n print("%d to the power %d is %d" % (x,n,power)) # Method 2 power = pow(x,n) print("%d to the power %d is %d" % (x,n,power)) # Method 3 power = math.pow(2,6.5) print("%d to the power %d is %5.2f" % (x,n,power))

    If/else statement

    This is arguably one of the most used statements in Python. It allows your code to execute a function if a certain condition is met. Unlike other languages, you don’t need to use curly braces. Here is a simple if/else script. # Assign a value number = 50 # Check the is more than 50 or not if (number >= 50): print("You have passed") else: print("You have not passed")

    Convert images to JPEG

    The most conventional systems rarely accept image formats such as PNG. As such, you’ll be required to convert them into JPEG files. Luckily, there’s a Python script that allows you to automate this process. import os import sys from PIL import Image if len(sys.argv) > 1: if os.path.exists(sys.argv[1]): im = Image.open(sys.argv[1]) target_name = sys.argv[1] + ".jpg" rgb_im = im.convert('RGB') rgb_im.save(target_name) print("Saved as " + target_name) else: print(sys.argv[1] + " not found") else: print("Usage: convert2jpg.py <file>")

    Download Google images

    If you are working on a project that demands many images, there’s a Python script that enables you to do so. With it, you can download hundreds of images simultaneously. However, you should avoid violating copyright terms. Click here for more information.

    Read battery level of Bluetooth device

    This script allows you to read the battery level of your Bluetooth headset. This is especially crucial if the level does not display on your PC. However, it does not support all Bluetooth headsets. For it to run, you need to have Docker on your system. Click here for more information.

    Delete Telegram messages

    Let’s face it, messaging apps do chew up much of your device’s storage space. And Telegram is no different. Luckily, this script allows you to delete all supergroups messages. You need to enter the supergroup’s information for the script to run. Click here for more information.

    Get song lyrics

    This is yet another popular Python script that enables you to scrape lyrics from the Genius site. It primarily works with Spotify, however, other media players with DBus MediaPlayer2 can also use the script. With it, you can sing along to your favorite song. Click here for more information.

    Heroku hosting

    Heroku is one of the most preferred hosting services. Used by thousands of developers, it allows you to build apps for free. Likewise, you can host your Python applications and scripts on Heroku with this script. Click here for more information.

    Github activity

    If you contribute to open source projects, keeping a record of your contributions is recommended. Not only do you track your contributions, but also appear professional when displaying your work to other people. With this script, you can generate a robust activity graph. Click here for information.

    Removing duplicate code

    When creating large apps or working on projects, it is normal to have duplicates in your list. This not only makes coding strenuous, but also makes your code appear unprofessional. With this script, you can remove duplicates seamlessly.

    Sending emails

    Emails are crucial to any businesses’ communication avenues. With Python, you can enable sites and web apps to send them without hiccups. However, businesses do not want to send each email manually, instead, they prefer to automate the process. This script allows you to choose which emails to reply to.

    Find specific files on your system

    Often, you forget the names or location of files on your system. This is not only annoying but also consumes time navigating through different folders. While there are programs that help you search for files, you need one that can automate the process. Luckily, this script enables you to choose which files and file types to search for. For example, if want to search for MP3 files, you can use this script. import fnmatch import os rootPath = '/' pattern = '*.mp3' for root, dirs, files in os.walk(rootPath): for filename in fnmatch.filter(files, pattern): print( os.path.join(root, filename))

    Generating random passwords

    Passwords bolster the privacy of app and website users. Besides, they prevent fraudulent use of accounts by cyber criminals. As such, you need to create an app or website that can generate random strong passwords. With this script, you can seamlessly generate them. import string from random import * characters = string.ascii_letters + string.punctuation + string.digits password = "".join(choice(characters) for x in range(randint(8, 16))) print (password)

    Print odd numbers

    Some projects may require you to print odd numbers within a specific range. While you can do this manually, it is time-consuming and prone to error. This means you need a program that can automate the process. Thanks to this script, you can achieve this.

    Get date value

    Python allows you to format a date value in numerous ways. With the DateTime module, this script allows you to read the current date and set a custom value.

    Removing items from a list

    You’ll often have to modify lists on your projects. Python enables you to do this using the Insert() and remove() methods. Here is a script you can use to achieve this. # Declare a fruit list fruits = ["Mango","Orange","Guava","Banana"] # Insert an item in the 2nd position fruits.insert(1, "Grape") # Displaying list after inserting print("The fruit list after insert:") print(fruits) # Remove an item fruits.remove("Guava") # Print the list after delete print("The fruit list after delete:") print(fruits)

    Count list items

    Using the count() method, you can print how many times a string appears in another string. You need to provide the string that Python will search. Here is a script to help you do so. # Define the string string = 'Python Bash Java PHP PHP PERL' # Define the search string search = 'P' # Store the count value count = string.count(search) # Print the formatted output print("%s appears %d times" % (search, count))

    Text grabber

    With this Python script, you can take a screenshot and copy the text in it. Click here for more information.

    Tweet search

    Ever searched for a tweet to no avail? Annoying, right! Well, why not use this script and let it do the legwork for you.

    反爬虫代码 直接炸了爬虫服务器

    很多人的爬虫是使用Requests来写的,如果你阅读过Requests的文档,那么你可能在文档中的Binary Response Content[1]这一小节,看到这样一句话: The gzip and deflate transfer-encodings are automatically decoded for you. (Request)会自动为你把gzip和deflate转码后的数据进行解码网站服务器可能会使用gzip压缩一些大资源,这些资源在网络上传输的时候,是压缩后的二进制格式。 客户端收到返回以后,如果发现返回的Headers里面有一个字段叫做Content-Encoding,其中的值包含gzip,那么客户端就会先使用gzip对数据进行解压,解压完成以后再把它呈现到客户端上面。 浏览器自动就会做这个事情,用户是感知不到这个事情发生的。 而requestsScrapy这种网络请求库或者爬虫框架,也会帮你做这个事情,因此你不需要手动对网站返回的数据解压缩。 这个功能原本是一个方便开发者的功能,但我们可以利用这个功能来做报复爬虫的事情。 我们首先写一个客户端,来测试一下返回gzip压缩数据的方法。 我首先在硬盘上创建一个文本文件text.txt,里面有两行内容,如下图所示: 然后,我是用gzip命令把它压缩成一个.gz文件: cattext.txt|gzip>data.gz 接下来,我们使用FastAPI写一个HTTP服务器server.pyfrom fastapi import FastAPI, Response from fastapi.responses import FileResponse app = FastAPI() @app.get('/') def index(): resp = FileResponse('data.gz') return resp 然后使用命令uvicorn server:app启动这个服务。 接下来,我们使用requests来请求这个接口,会发现返回的数据是乱码,如下图所示: 返回的数据是乱码,这是因为服务器没有告诉客户端,这个数据是gzip压缩的,因此客户端只有原样展示。 由于压缩后的数据是二进制内容,强行转成字符串就会变成乱码。 现在,我们稍微修改一下server.py的代码,通过Headers告诉客户端,这个数据是经过gzip压缩的: from fastapi import FastAPI, Response from fastapi.responses import FileResponse app = FastAPI() @app.get('/') def index(): resp = FileResponse('data.gz') resp.headers['Content-Encoding'] = 'gzip' # 说明这是gzip压缩的数据 return resp 修改以后,重新启动服务器,再次使用requests请求,发现已经可以正常显示数据了: 这个功能已经展示完了,那么我们怎么利用它呢? 这就不得不提到压缩文件的原理了。 文件之所以能压缩,是因为里面有大量重复的元素,这些元素可以通过一种更简单的方式来表示。 压缩的算法有很多种,其中最常见的一种方式,我们用一个例子来解释。 假设有一个字符串,它长成下面这样: 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111我们可以用5个字符来表示: 192个1。 这就相当于把192个字符压缩成了5个字符,压缩率高达97.4%。 如果我们可以把一个1GB的文件压缩成1MB,那么对服务器来说,仅仅是返回了1MB的二进制数据,不会造成任何影响。 但是对客户端或者爬虫来说,它拿到这个1MB的数据以后,就会在内存中把它还原成1GB的内容。 这样一瞬间爬虫占用的内存就增大了1GB。 如果我们再进一步增大这个原始数据,那么很容易就可以把爬虫所在的服务器内存全部沾满,轻者服务器直接杀死爬虫进程,重则爬虫服务器直接死机。 你别以为这个压缩比听起来很夸张,其实我们使用很简单的一行命令就可以生成这样的压缩文件。 如果你用的是Linux,那么请执行命令: dd if=/dev/zero bs=1M count=1000 | gzip > boom.gz 如果你的电脑是macOS,那么请执行命令: dd if=/dev/zero bs=1048576 count=1000 | gzip > boom.gz 执行过程如下图所示: 生成的这个boom.gz文件只有995KB。 但是如果我们使用gzip -d boom.gz对这个文件解压缩,就会发现生成了一个1GB的boom文件,如下图所示: 只要大家把命令里面的count=1000改成一个更大的数字,就能得到更大的文件。 我现在把count改成10,给大家做一个演示(不敢用1GB的数据来做测试,害怕我的Jupyter崩溃)。 生成的boom.gz文件只有10KB: 服务器返回一个10KB的二进制数据,没有任何问题。 现在我们用requests去请求这个接口,然后查看一下resp这个对象占用的内存大小: 可以看到,由于requests自动会对返回的数据解压缩,因此最终获得的resp对象竟然有10MB这么大。 如果大家想使用这个方法,一定要先确定这个请求是爬虫发的,再使用。 否则被你干死的不是爬虫而是真实用户就麻烦了。 本文的写作过程中,参考了文章网站gzip炸弹 网站gzip炸弹 http://da.dadaaierer.com/?p=577

    Fancier Output Formatting

    https://docs.python.org/3/tutorial/ So far we’ve encountered two ways of writing values: expression statements and the print() function. (A third way is using the write() method of file objects; the standard output file can be referenced as sys.stdout. See the Library Reference for more information on this.) Often you’ll want more control over the formatting of your output than simply printing space-separated values. There are several ways to format output. To use formatted string literals, begin a string with f or F before the opening quotation mark or triple quotation mark. Inside this string, you can write a Python expression between { and } characters that can refer to variables or literal values. >>> year = 2016 >>> event = "Referendum" >>> f"Results of the {year} {event}" "Results of the 2016 Referendum" The str.format() method of strings requires more manual effort. You’ll still use { and } to mark where a variable will be substituted and can provide detailed formatting directives, but you’ll also need to provide the information to be formatted. >>> yes_votes = 42_572_654 >>> no_votes = 43_132_495 >>> percentage = yes_votes / (yes_votes + no_votes) >>> "{:-9} YES votes {:2.2%}".format(yes_votes, percentage) " 42572654 YES votes 49.67%" Finally, you can do all the string handling yourself by using string slicing and concatenation operations to create any layout you can imagine. The string type has some methods that perform useful operations for padding strings to a given column width. When you don’t need fancy output but just want a quick display of some variables for debugging purposes, you can convert any value to a string with the repr() or str() functions. The str() function is meant to return representations of values which are fairly human-readable, while repr() is meant to generate representations which can be read by the interpreter (or will force a SyntaxError if there is no equivalent syntax). For objects which don’t have a particular representation for human consumption, str() will return the same value as repr(). Many values, such as numbers or structures like lists and dictionaries, have the same representation using either function. Strings, in particular, have two distinct representations. Some examples: >>> s = "Hello, world." >>> str(s) "Hello, world." >>> repr(s) ""Hello, world."" >>> str(1/7) "0.14285714285714285" >>> x = 10 * 3.25 >>> y = 200 * 200 >>> s = "The value of x is " + repr(x) + ", and y is " + repr(y) + "..." >>> print(s) The value of x is 32.5, and y is 40000... >>> # The repr() of a string adds string quotes and backslashes: ... hello = "hello, world\n" >>> hellos = repr(hello) >>> print(hellos) "hello, world\n" >>> # The argument to repr() may be any Python object: ... repr((x, y, ("spam", "eggs"))) "(32.5, 40000, ("spam", "eggs"))" The string module contains a Template class that offers yet another way to substitute values into strings, using placeholders like $x and replacing them with values from a dictionary, but offers much less control of the formatting. 7.1.1. Formatted String Literals Formatted string literals (also called f-strings for short) let you include the value of Python expressions inside a string by prefixing the string with f or F and writing expressions as {expression}. An optional format specifier can follow the expression. This allows greater control over how the value is formatted. The following example rounds pi to three places after the decimal: >>> import math >>> print(f"The value of pi is approximately {math.pi:.3f}.") The value of pi is approximately 3.142. Passing an integer after the ':' will cause that field to be a minimum number of characters wide. This is useful for making columns line up. >>> table = {"Sjoerd": 4127, "Jack": 4098, "Dcab": 7678} >>> for name, phone in table.items(): ... print(f"{name:10} ==> {phone:10d}") ... Sjoerd ==> 4127 Jack ==> 4098 Dcab ==> 7678 Other modifiers can be used to convert the value before it is formatted. '!a' applies ascii(), '!s' applies str(), and '!r' applies repr(): >>> animals = "eels" >>> print(f"My hovercraft is full of {animals}.") My hovercraft is full of eels. >>> print(f"My hovercraft is full of {animals!r}.") My hovercraft is full of "eels". The = specifier can be used to expand an expression to the text of the expression, an equal sign, then the representation of the evaluated expression: >>> bugs = "roaches" >>> count = 13 >>> area = "living room" >>> print(f"Debugging {bugs=} {count=} {area=}") Debugging bugs="roaches" count=13 area="living room" See self-documenting expressions for more information on the = specifier. For a reference on these format specifications, see the reference guide for the Format Specification Mini-Language. 7.1.2. The String format() Method Basic usage of the str.format() method looks like this: >>> print("We are the {} who say "{}!"".format("knights", "Ni")) We are the knights who say "Ni!" The brackets and characters within them (called format fields) are replaced with the objects passed into the str.format() method. A number in the brackets can be used to refer to the position of the object passed into the str.format() method. >>> print("{0} and {1}".format("spam", "eggs")) spam and eggs >>> print("{1} and {0}".format("spam", "eggs")) eggs and spam If keyword arguments are used in the str.format() method, their values are referred to by using the name of the argument. >>> print("This {food} is {adjective}.".format( ... food="spam", adjective="absolutely horrible")) This spam is absolutely horrible. Positional and keyword arguments can be arbitrarily combined: >>> print("The story of {0}, {1}, and {other}.".format("Bill", "Manfred", ... other="Georg")) The story of Bill, Manfred, and Georg. If you have a really long format string that you don’t want to split up, it would be nice if you could reference the variables to be formatted by name instead of by position. This can be done by simply passing the dict and using square brackets '[]' to access the keys. >>> table = {"Sjoerd": 4127, "Jack": 4098, "Dcab": 8637678} >>> print("Jack: {0[Jack]:d}; Sjoerd: {0[Sjoerd]:d}; " ... "Dcab: {0[Dcab]:d}".format(table)) Jack: 4098; Sjoerd: 4127; Dcab: 8637678 This could also be done by passing the table dictionary as keyword arguments with the ** notation. >>> table = {"Sjoerd": 4127, "Jack": 4098, "Dcab": 8637678} >>> print("Jack: {Jack:d}; Sjoerd: {Sjoerd:d}; Dcab: {Dcab:d}".format(**table)) Jack: 4098; Sjoerd: 4127; Dcab: 8637678 This is particularly useful in combination with the built-in function vars(), which returns a dictionary containing all local variables. As an example, the following lines produce a tidily aligned set of columns giving integers and their squares and cubes: >>> for x in range(1, 11): ... print("{0:2d} {1:3d} {2:4d}".format(x, x*x, x*x*x)) ... 1 1 1 2 4 8 3 9 27 4 16 64 5 25 125 6 36 216 7 49 343 8 64 512 9 81 729 10 100 1000 For a complete overview of string formatting with str.format(), see Format String Syntax. 7.1.3. Manual String Formatting Here’s the same table of squares and cubes, formatted manually: >>> for x in range(1, 11): ... print(repr(x).rjust(2), repr(x*x).rjust(3), end=" ") ... # Note use of "end" on previous line ... print(repr(x*x*x).rjust(4)) ... 1 1 1 2 4 8 3 9 27 4 16 64 5 25 125 6 36 216 7 49 343 8 64 512 9 81 729 10 100 1000 (Note that the one space between each column was added by the way print() works: it always adds spaces between its arguments.) The str.rjust() method of string objects right-justifies a string in a field of a given width by padding it with spaces on the left. There are similar methods str.ljust() and str.center(). These methods do not write anything, they just return a new string. If the input string is too long, they don’t truncate it, but return it unchanged; this will mess up your column lay-out but that’s usually better than the alternative, which would be lying about a value. (If you really want truncation you can always add a slice operation, as in x.ljust(n)[:n].) There is another method, str.zfill(), which pads a numeric string on the left with zeros. It understands about plus and minus signs: >>> "12".zfill(5) "00012" >>> "-3.14".zfill(7) "-003.14" >>> "3.14159265359".zfill(5) "3.14159265359" 7.1.4. Old string formatting The % operator (modulo) can also be used for string formatting. Given 'string' % values, instances of % in string are replaced with zero or more elements of values. This operation is commonly known as string interpolation. For example: >>> import math >>> print("The value of pi is approximately %5.3f." % math.pi) The value of pi is approximately 3.142. More information can be found in the printf-style String Formatting section.

    Reading and Writing Files

    open() returns a file object, and is most commonly used with two positional arguments and one keyword argument: open(filename, mode, encoding=None) >>> f = open("workfile", "w", encoding="utf-8") The first argument is a string containing the filename. The second argument is another string containing a few characters describing the way in which the file will be used. mode can be 'r' when the file will only be read, 'w' for only writing (an existing file with the same name will be erased), and 'a' opens the file for appending; any data written to the file is automatically added to the end. 'r+' opens the file for both reading and writing. The mode argument is optional; 'r' will be assumed if it’s omitted. Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding. If encoding is not specified, the default is platform dependent (see open()). Because UTF-8 is the modern de-facto standard, encoding="utf-8" is recommended unless you know that you need to use a different encoding. Appending a 'b' to the mode opens the file in binary mode. Binary mode data is read and written as bytes objects. You can not specify encoding when opening file in binary mode. In text mode, the default when reading is to convert platform-specific line endings (\n on Unix, \r\n on Windows) to just \n. When writing in text mode, the default is to convert occurrences of \n back to platform-specific line endings. This behind-the-scenes modification to file data is fine for text files, but will corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using with is also much shorter than writing equivalent try-finally blocks: >>> with open("workfile", encoding="utf-8") as f: ... read_data = f.read() >>> # We can check that the file has been automatically closed. >>> f.closed True If you’re not using the with keyword, then you should call f.close() to close the file and immediately free up any system resources used by it. Warning Calling f.write() without using the with keyword or calling f.close() might result in the arguments of f.write() not being completely written to the disk, even if the program exits successfully. After a file object is closed, either by a with statement or by calling f.close(), attempts to use the file object will automatically fail. >>> f.close() >>> f.read() Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: I/O operation on closed file. 7.2.1. Methods of File Objects The rest of the examples in this section will assume that a file object called f has already been created. To read a file’s contents, call f.read(size), which reads some quantity of data and returns it as a string (in text mode) or bytes object (in binary mode). size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size characters (in text mode) or size bytes (in binary mode) are read and returned. If the end of the file has been reached, f.read() will return an empty string (''). >>> f.read() "This is the entire file.\n" >>> f.read() "" f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline. >>> f.readline() "This is the first line of the file.\n" >>> f.readline() "Second line of the file\n" >>> f.readline() "" For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code: >>> for line in f: ... print(line, end=") ... This is the first line of the file. Second line of the file If you want to read all the lines of a file in a list you can also use list(f) or f.readlines(). f.write(string) writes the contents of string to the file, returning the number of characters written. >>> f.write("This is a test\n") 15 Other types of objects need to be converted – either to a string (in text mode) or a bytes object (in binary mode) – before writing them: >>> value = ("the answer", 42) >>> s = str(value) # convert the tuple to string >>> f.write(s) 18 f.tell() returns an integer giving the file object’s current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode. To change the file object’s position, use f.seek(offset, whence). The position is computed from adding offset to a reference point; the reference point is selected by the whence argument. A whence value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0, using the beginning of the file as the reference point. >>> f = open("workfile", "rb+") >>> f.write(b"0123456789abcdef") 16 >>> f.seek(5) # Go to the 6th byte in the file 5 >>> f.read(1) b"5" >>> f.seek(-3, 2) # Go to the 3rd byte before the end 13 >>> f.read(1) b"d" In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour. File objects have some additional methods, such as isatty() and truncate() which are less frequently used; consult the Library Reference for a complete guide to file objects. 7.2.2. Saving structured data with json Strings can easily be written to and read from a file. Numbers take a bit more effort, since the read() method only returns strings, which will have to be passed to a function like int(), which takes a string like '123' and returns its numeric value 123. When you want to save more complex data types like nested lists and dictionaries, parsing and serializing by hand becomes complicated. Rather than having users constantly writing and debugging code to save complicated data types to files, Python allows you to use the popular data interchange format called JSON (JavaScript Object Notation). The standard module called json can take Python data hierarchies, and convert them to string representations; this process is called serializing. Reconstructing the data from the string representation is called deserializing. Between serializing and deserializing, the string representing the object may have been stored in a file or data, or sent over a network connection to some distant machine. Note The JSON format is commonly used by modern applications to allow for data exchange. Many programmers are already familiar with it, which makes it a good choice for interoperability. If you have an object x, you can view its JSON string representation with a simple line of code: >>> import json >>> x = [1, "simple", "list"] >>> json.dumps(x) "[1, "simple", "list"]" Another variant of the dumps() function, called dump(), simply serializes the object to a text file. So if f is a text file object opened for writing, we can do this: json.dump(x, f) To decode the object again, if f is a binary file or text file object which has been opened for reading: x = json.load(f) Note JSON files must be encoded in UTF-8. Use encoding="utf-8" when opening JSON file as a text file for both of reading and writing. This simple serialization technique can handle lists and dictionaries, but serializing arbitrary class instances in JSON requires a bit of extra effort. The reference for the json module contains an explanation of this. See also pickle - the pickle module Contrary to JSON, pickle is a protocol which allows the serialization of arbitrarily complex Python objects. As such, it is specific to Python and cannot be used to communicate with applications written in other languages. It is also insecure by default: deserializing pickle data coming from an untrusted source can execute arbitrary code, if the data was crafted by a skilled attacker.

    Python Dictionaries

    A dictionary is a collection which is ordered, changeable and do not allow duplicates. Dictionaries are used to store data values in key:value pairs. Dictionaries cannot have two items with the same key. Duplicate values will overwrite existing values. thisdict = { "brand": "Ford", "model": "Mustang", "year": 1964 } print(thisdict["brand"])

    Python PDF

    pip install fpdf

    10 Python Scripts to Automate Your Daily Task



    Parse and Extract HTML

    This automation script will help you to extract the HTML from the webpage URL and then also provide you function that you can use to Parse the HTML for data. This awesome script is a great treat for web scrapers and for those who want to Parse HTML for important data. # Parse and Extract HTML # pip install gazpacho import gazpacho # Extract HTML from URL url = 'https://www.example.com/' html = gazpacho.get(url) print(html) # Extract HTML with Headers headers = {'User-Agent': 'Mozilla/5.0'} html = gazpacho.get(url, headers=headers) print(html) # Parse HTML parse = gazpacho.Soup(html) # Find single tags tag1 = parse.find('h1') tag2 = parse.find('span') # Find multiple tags tags1 = parse.find_all('p') tags2 = parse.find_all('a') # Find tags by class tag = parse.find('.class') # Find tags by Attribute tag = parse.find("div", attrs={"class": "test"}) # Extract text from tags text = parse.find('h1').text text = parse.find_all('p')[0].text

    Qrcode Scanner

    Having a lot of Qr images or just want to scan a QR image then this automation script will help you with it. This script uses the Qrtools module that will enable you to scan your QR images programmatically. # Qrcode Scanner # pip install qrtools from qrtools import Qr def Scan_Qr(qr_img): qr = Qr() qr.decode(qr_img) print(qr.data) return qr.data print("Your Qr Code is: ", Scan_Qr("qr.png"))

    Take Screenshots

    Now you can Take Screenshots programmatically by using this awesome script below. With this script, you can take a direct screenshots or take specific area screenshots too. # Grab Screenshot # pip install pyautogui # pip install Pillow from pyautogui import screenshot import time from PIL import ImageGrab # Grab Screenshot of Screen def grab_screenshot(): shot = screenshot() shot.save('my_screenshot.png') # Grab Screenshot of Specific Area def grab_screenshot_area(): area = (0, 0, 500, 500) shot = ImageGrab.grab(area) shot.save('my_screenshot_area.png') # Grab Screenshot with Delay def grab_screenshot_delay(): time.sleep(5) shot = screenshot() shot.save('my_screenshot_delay.png')

    Create AudioBooks

    Tired of converting Your PDF books to Audiobooks manually, Then here is your automation script that uses the GTTS module that will convert your PDF text to audio. # Create Audiobooks # pip install gTTS # pip install PyPDF2 from PyPDF2 import PdfFileReader as reader from gtts import gTTS def create_audio(pdf_file): read_Pdf = reader(open(pdf_file, 'rb')) for page in range(read_Pdf.numPages): text = read_Pdf.getPage(page).extractText() tts = gTTS(text, lang='en') tts.save('page' + str(page) + '.mp3') create_audio('book.pdf')

    PDF Editor

    Use this below automation script to Edit your PDF files with Python. This script uses the PyPDF4 module which is the upgrade version of PyPDF2 and below I coded the common function like Parse Text, Remove pages, and many more. Handy script when you have a lot of PDFs to Edit or need a script in your Python Project programmatically. # PDF Editor # pip install PyPDf4 import PyPDF4 # Parse the Text from PDF def parse_text(pdf_file): reader = PyPDF4.PdfFileReader(pdf_file) for page in reader.pages: print(page.extractText()) # Remove Page from PDF def remove_page(pdf_file, page_numbers): filer = PyPDF4.PdfReader('source.pdf', 'rb') out = PyPDF4.PdfWriter() for index in page_numbers: page = filer.pages[index] out.add_page(page) with open('rm.pdf', 'wb') as f: out.write(f) # Add Blank Page to PDF def add_page(pdf_file, page_number): reader = PyPDF4.PdfFileReader(pdf_file) writer = PyPDF4.PdfWriter() writer.addPage() with open('add.pdf', 'wb') as f: writer.write(f) # Rotate Pages def rotate_page(pdf_file): reader = PyPDF4.PdfFileReader(pdf_file) writer = PyPDF4.PdfWriter() for page in reader.pages: page.rotateClockwise(90) writer.addPage(page) with open('rotate.pdf', 'wb') as f: writer.write(f) # Merge PDFs def merge_pdfs(pdf_file1, pdf_file2): pdf1 = PyPDF4.PdfFileReader(pdf_file1) pdf2 = PyPDF4.PdfFileReader(pdf_file2) writer = PyPDF4.PdfWriter() for page in pdf1.pages: writer.addPage(page) for page in pdf2.pages: writer.addPage(page) with open('merge.pdf', 'wb') as f: writer.write(f)

    👉Mini Stackoverflow

    As a programmer I know we need StackOverflow every day but you no longer need to go and search on Google for it. Now get direct solutions in your CMD while you continue working on a project. By using Howdoi module you can get the StackOverflow solution in your command prompt or terminal. Below you can find some examples that you can try. # Automate Stackoverflow # pip install howdoi # Get Answers in CMD #example 1 > howdoi how do i install python3 # example 2 > howdoi selenium Enter keys # example 3 > howdoi how to install modules # example 4 > howdoi Parse html with python # example 5 > howdoi int not iterable error # example 6 > howdoi how to parse pdf with python # example 7 > howdoi Sort list in python # example 8 > howdoi merge two lists in python # example 9 >howdoi get last element in list python # example 10 > howdoi fast way to sort list

    Automate Mobile Phone

    This automation script will help you to automate your Smart Phone by using the Android debug bridge (ADB) in Python. Below I show how you can automate common tasks like swipe gestures, calling, sending Sms, and much more. You can learn more about ADB and explore more exciting ways to automate your phones for making your life easier. # Automate Mobile Phones # pip install opencv-python import subprocess def main_adb(cm): p = subprocess.Popen(cm.split(' '), stdout=subprocess.PIPE, shell=True) (output, _) = p.communicate() return output.decode('utf-8') # Swipe def swipe(x1, y1, x2, y2, duration): cmd = 'adb shell input swipe {} {} {} {} {}'.format(x1, y1, x2, y2, duration) return main_adb(cmd) # Tap or Clicking def tap(x, y): cmd = 'adb shell input tap {} {}'.format(x, y) return main_adb(cmd) # Make a Call def make_call(number): cmd = f"adb shell am start -a android.intent.action.CALL -d tel:{number}" return main_adb(cmd) # Send SMS def send_sms(number, message): cmd = 'adb shell am start -a android.intent.action.SENDTO -d sms:{} --es sms_body "{}"'.format(number, message) return main_adb(cmd) # Download File From Mobile to PC def download_file(file_name): cmd = 'adb pull /sdcard/{}'.format(file_name) return main_adb(cmd) # Take a screenshot def screenshot(): cmd = 'adb shell screencap -p' return main_adb(cmd) # Power On and Off def power_off(): cmd = '"adb shell input keyevent 26"' return main_adb(cmd)

    Monitor CPU/GPU Temp

    You Probably use CPU-Z or any specs monitoring software to capture your Cpu and Gpu temperature but you know you can do that programmatically too. Well, this script uses the Pythonnet and OpenhardwareMonitor that help you to monitor your current Cpu and Gpu Temperature. You can use it to notify yourself when a certain amount of temperature reaches or you can use it in your Python project to make your daily life easy. # Get CPU/GPU Temperature # pip install pythonnet import clr clr.AddReference("OpenHardwareMonitorLib") from OpenHardwareMonitorLib import * spec = Computer() spec.GPUEnabled = True spec.CPUEnabled = True spec.Open() # Get CPU Temp def Cpu_Temp(): while True: for cpu in range(0, len(spec.Hardware[0].Sensors)): if "/temperature" in str(spec.Hardware[0].Sensors[cpu].Identifier): print(str(spec.Hardware[0].Sensors[cpu].Value)) # Get GPU Temp def Gpu_Temp() while True: for gpu in range(0, len(spec.Hardware[0].Sensors)): if "/temperature" in str(spec.Hardware[0].Sensors[gpu].Identifier): print(str(spec.Hardware[0].Sensors[gpu].Value))

    Instagram Uploader Bot

    Instagram is a well famous social media platform and you know you don’t need to upload your photos or video through your smartphone now. You can do it programmatically by using the below script. # Upload Photos and Video on Insta # pip install instabot from instabot import Bot def Upload_Photo(img): robot = Bot() robot.login(username="user", password="pass") robot.upload_photo(img, caption="Medium Article") print("Photo Uploaded") def Upload_Video(video): robot = Bot() robot.login(username="user", password="pass") robot.upload_video(video, caption="Medium Article") print("Video Uploaded") def Upload_Story(img): robot = Bot() robot.login(username="user", password="pass") robot.upload_story(img, caption="Medium Article") print("Story Photos Uploaded") Upload_Photo("img.jpg") Upload_Video("video.mp4")

    Video Watermarker

    Add watermark to your videos by using this automation script which uses Moviepy which is a handy module for video editing. In the below script, you can see how you can watermark and you are free to use it. # Video Watermark with Python # pip install moviepy from moviepy.editor import * clip = VideoFileClip("myvideo.mp4", audio=True) width,height = clip.size text = TextClip("WaterMark", font='Arial', color='white', fontsize=28) set_color = text.on_color(size=(clip.w + text.w, text.h-10), color=(0,0,0), pos=(6,'center'), col_opacity=0.6) set_textPos = set_color.set_pos( lambda pos: (max(width/30,int(width-0.5* width* pos)),max(5*height/6,int(100* pos))) ) Output = CompositeVideoClip([clip, set_textPos]) Output.duration = clip.duration Output.write_videofile("output.mp4", fps=30, codec='libx264')

    macro inside an Excel file using Python

    use the xlwings library. xlwings allows you to interact with Excel files and access macros. Install the xlwings library by running the following command in your Python environment: pip install xlwings import xlwings as xw Use the xw.Book() function to open the Excel file containing the macro: wb = xw.Book('path_to_your_excel_file.xlsx') Access the macro within the Excel file using the macro attribute of the Workbook object: macro_code = wb.macro('macro_name') Replace `'macro_name'` with the name of the macro you want to read. Print or manipulate the macro_code as needed: print(macro_code) You can save it to a file or process it further, depending on your requirements. Close the workbook after you have finished reading the macro: wb.close() To list out all macros Access the macros attribute of the Workbook object to obtain a list of all macros: macro_list = wb.macro_names Print or process the macro_list as needed: for macro_name in macro_list: print(macro_name)

    Python 处理 Excel 的 14 个常用操作



    关联公式:Vlookup

    vlookup是excel几乎最常用的公式,一般用于两个表的关联查询等。 所以我先把这张表分为两个表。 df1=sale[['订单明细号','单据日期','地区名称', '业务员名称','客户分类', '存货编码', '客户名称', '业务员编码', '存货名称', '订单号', '客户编码', '部门名称', '部门编码']] df2=sale[['订单明细号','存货分类', '税费', '不含税金额', '订单金额', '利润', '单价','数量']] 需求:想知道df1的每一个订单对应的利润是多少。 利润一列存在于df2的表格中,所以想知道df1的每一个订单对应的利润是多少。 用excel的话首先确认订单明细号是唯一值,然后在df1新增一列写:=vlookup(a2,df2!a:h,6,0) ,然后往下拉就ok了。 (剩下13个我就不写excel啦) 那用python是如何实现的呢? #查看订单明细号是否重复,结果是没。 df1["订单明细号"].duplicated().value_counts() df2["订单明细号"].duplicated().value_counts() df_c=pd.merge(df1,df2,on="订单明细号",how="left")

    数据透视表

    需求:想知道每个地区的业务员分别赚取的利润总和与利润平均数。 pd.pivot_table(sale,index="地区名称",columns="业务员名称",values="利润",aggfunc=[np.sum,np.mean])

    对比两列差异

    因为这表每列数据维度都不一样,比较起来没啥意义,所以我先做了个订单明细号的差异再进行比较。 需求:比较订单明细号与订单明细号2的差异并显示出来。 sale["订单明细号2"]=sale["订单明细号"] #在订单明细号2里前10个都+1. sale["订单明细号2"][1:10]=sale["订单明细号2"][1:10]+1 #差异输出 result=sale.loc[sale["订单明细号"].isin(sale["订单明细号2"])==False]

    去除重复值

    需求:去除业务员编码的重复值 sale.drop_duplicates("业务员编码",inplace=True)

    缺失值处理

    先查看销售数据哪几列有缺失值。 #列的行数小于index的行数的说明有缺失值,这里客户名称329<335,说明有缺失值 sale.info() 需求:用0填充缺失值或则删除有客户编码缺失值的行。 实际上缺失值处理的办法是很复杂的,这里只介绍简单的处理方法,若是数值变量,最常用平均数或中位数或众数处理,比较复杂的可以用随机森林模型根据其他维度去预测结果填充。 若是分类变量,根据业务逻辑去填充准确性比较高。 比如这里的需求填充客户名称缺失值:就可以根据存货分类出现频率最大的存货所对应的客户名称去填充。 这里我们用简单的处理办法:用0填充缺失值或则删除有客户编码缺失值的行。 #用0填充缺失值 sale["客户名称"]=sale["客户名称"].fillna(0) #删除有客户编码缺失值的行 sale.dropna(subset=["客户编码"])

    多条件筛选

    需求:想知道业务员张爱,在北京区域卖的商品订单金额大于6000的信息。 sale.loc[(sale["地区名称"]=="北京")&(sale["业务员名称"]=="张爱")&(sale["订单金额"]>5000)]

    模糊筛选数据

    需求:筛选存货名称含有"三星"或则含有"索尼"的信息。 sale.loc[sale["存货名称"].str.contains("三星|索尼")]

    分类汇总

    需求:北京区域各业务员的利润总额。 sale.groupby(["地区名称","业务员名称"])["利润"].sum()

    条件计算

    需求:存货名称含“三星字眼”并且税费高于1000的订单有几个?这些订单的利润总和和平均利润是多少?(或者最小值,最大值,四分位数,标注差) sale.loc[sale["存货名称"].str.contains("三星")&(sale["税费"]>=1000)][["订单明细号","利润"]].describe()

    删除数据间的空格

    需求:删除存货名称两边的空格。 sale["存货名称"].map(lambda s :s.strip(""))

    数据分列

    需求:将日期与时间分列。 sale=pd.merge(sale,pd.DataFrame(sale["单据日期"].str.split(" ",expand=True)),how="inner",left_index=True,right_index=True)

    异常值替换

    首先用describe()函数简单查看一下数据有无异常值。 #可看到销项税有负数,一般不会有这种情况,视它为异常值。 sale.describe() 需求:用0代替异常值。 sale["订单金额"]=sale["订单金额"].replace(min(sale["订单金额"]),0)

    分组

    需求:根据利润数据分布把地区分组为:"较差","中等","较好","非常好" 首先,当然是查看利润的数据分布呀,这里我们采用四分位数去判断。 sale.groupby("地区名称")["利润"].sum().describe() 根据四分位数把地区总利润为[-9,7091]区间的分组为“较差”,(7091,10952]区间的分组为"中等" (10952,17656]分组为较好,(17656,37556]分组为非常好。 #先建立一个Dataframe sale_area=pd.DataFrame(sale.groupby("地区名称")["利润"].sum()).reset_index() #设置bins,和分组名称 bins=[-10,7091,10952,17656,37556] groups=["较差","中等","较好","非常好"] #使用cut分组 #sale_area["分组"]=pd.cut(sale_area["利润"],bins,labels=groups)

    根据业务逻辑定义标签

    需求:销售利润率(即利润/订单金额)大于30%的商品信息并标记它为优质商品,小于5%为一般商品。 sale.loc[(sale["利润"]/sale["订单金额"])>0.3,"label"]="优质商品" sale.loc[(sale["利润"]/sale["订单金额"])<0.05,"label"]="一般商品" 其实excel常用的操作还有很多,我就列举了14个自己比较常用的,若还想实现哪些操作可以评论一起交流讨论,另外我自身也知道我写python不够精简,惯性使用loc。 (其实query会比较精简)。 最后想说说,我觉得最好不要拿excel和python做对比,去研究哪个好用,其实都是工具,excel作为最为广泛的数据处理工具,垄断这么多年必定在数据处理方便也是相当优秀的,有些操作确实python会比较简单,但也有不少excel操作起来比python简单的。 比如一个很简单的操作:对各列求和并在最下一行显示出来,excel就是对一列总一个sum()函数,然后往左一拉就解决,而python则要定义一个函数(因为python要判断格式,若非数值型数据直接报错。 ) 总结一下就是:无论用哪个工具,能解决问题就是好数据分析师!

    List Comprehension



    Python Comprehensions

    Using Python Comprehensions
    python list comprehension exercise K’th Non-repeating Character in Python using List Comprehension and OrderedDict

    List Comprehension

    List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list. Example: Based on a list of fruits, you want a new list, containing only the fruits with the letter "a" in the name. Without list comprehension you will have to write a for statement with a conditional test inside: Example fruits = ["apple", "banana", "cherry", "kiwi", "mango"] newlist = [] for x in fruits: if "a" in x: newlist.append(x) print(newlist) With list comprehension you can do all that with only one line of code: Example fruits = ["apple", "banana", "cherry", "kiwi", "mango"] newlist = [x for x in fruits if "a" in x] print(newlist)


    The Syntax

    newlist = [expression for item in iterable if condition == True] The return value is a new list, leaving the old list unchanged.

    Condition

    The condition is like a filter that only accepts the items that valuate to True.


    Example Only accept items that are not "apple":

    newlist = [x for x in fruits if x != "apple"] The condition if x != "apple" will return True for all elements other than "apple", making the new list contain all fruits except "apple". The condition is optional and can be omitted:


    Example With no if statement:

    newlist = [x for x in fruits]


    Iterable

    The iterable can be any iterable object, like a list, tuple, set etc.


    Example You can use the range() function to create an iterable:

    newlist = [x for x in range(10)] Same example, but with a condition:


    Example Accept only numbers lower than 5:

    newlist = [x for x in range(10) if x < 5]


    Expression

    The expression is the current item in the iteration, but it is also the outcome, which you can manipulate before it ends up like a list item in the new list:


    Example Set the values in the new list to upper case:

    newlist = [x.upper() for x in fruits] You can set the outcome to whatever you like:


    Example Set all values in the new list to 'hello':

    newlist = ['hello' for x in fruits] The expression can also contain conditions, not like a filter, but as a way to manipulate the outcome:


    Example Return "orange" instead of "banana":

    newlist = [x if x != "banana" else "orange" for x in fruits] The expression in the example above says: "Return the item if it is not banana, if it is banana return orange".
    1、For 循环
    for 循环是一个多行语句,但是在 Python 中,我们可以使用 List Comprehension 方法在一行中编写 for 循环。 让我们以过滤小于 250 的值为例。 示例代码如下: #For loop in One line mylist = [100, 200, 300, 400, 500] #Orignal way result = [] for x in mylist: if x > 250: result.append(x) print(result) # [300, 400, 500] #One Line Way result = [x for x in mylist if x > 250] print(result) # [300, 400, 500] 2、 While 循环 这个 One-Liner 片段将向您展示如何在 One Line 中使用 While 循环代码,在这里,我已经展示了两种方法。 代码如下: #method 1 Single Statement while True: print(1) # infinite 1 #method 2 Multiple Statement x = 0 while x < 5: print(x); x= x + 1 # 0 1 2 3 4 5 3、IF Else 语句 好吧,要在 One Line 中编写 IF Else 语句,我们将使用三元运算符。 三元的语法是“[on true] if [expression] else [on false]”。 我在下面的示例代码中展示了 3 个示例,以使您清楚地了解如何将三元运算符用于一行 if-else 语句,要使用 Elif 语句,我们必须使用多个三元运算符。 #if Else in One Line #Example 1 if else print("Yes") if 8 > 9 else print("No") # No #Example 2 if elif else E = 2 print("High") if E == 5 else print("Meidum") if E == 2 else print("Low") # Medium #Example 3 only if if 3 > 2: print("Exactly") # Exactly 4、合并字典 这个单行代码段将向您展示如何使用一行代码将两个字典合并为一个。 下面我展示了两种合并字典的方法。 # Merge Dictionary in One Line d1 = { 'A': 1, 'B': 2 } d2 = { 'C': 3, 'D': 4 } #method 1 d1.update(d2) print(d1) # {'A': 1, 'B': 2, 'C': 3, 'D': 4} #method 2 d3 = {**d1, **d2} print(d3) # {'A': 1, 'B': 2, 'C': 3, 'D': 4} 5、编写函数 我们有两种方法可以在一行中编写函数,在第一种方法中,我们将使用与三元运算符或单行循环方法相同的函数定义。 第二种方法是用 lambda 定义函数,查看下面的示例代码以获得更清晰的理解。 #Function in One Line #method 1 def fun(x): return True if x % 2 == 0 else False print(fun(2)) # False #method 2 fun = lambda x : x % 2 == 0 print(fun(2)) # True print(fun(3)) # False 6、单行递归 这个单行代码片段将展示如何在一行中使用递归,我们将使用一行函数定义和一行 if-else 语句,下面是查找斐波那契数的示例。 # Recursion in One Line #Fibonaci example with one line Recursion def Fib(x): return 1 if x in {0, 1} else Fib(x-1) + Fib(x-2) print(Fib(5)) # 8 print(Fib(15)) # 987 7、数组过滤 Python 列表可以通过使用列表推导方法在一行代码中进行过滤,让我们以过滤偶数列表为例。 # Array Filtering in One Line mylist = [2, 3, 5, 8, 9, 12, 13, 15] #Normal Way result = [] for x in mylist: if x % 2 == 0: result.append(x) print(result) # [2, 8, 12] #One Line Way result = [x for x in mylist if x % 2 == 0] print(result) # [2, 8, 12] 8、异常处理 我们使用异常处理来处理 Python 中的运行时错误,你知道我们可以在 One-Line 中编写这个 Try except 语句吗?通过使用 exec() 语句,我们可以做到这一点。 # Exception Handling in One Line #Original Way try: print(x) except: print("Error") #One Line Way exec('try:print(x) \nexcept:print("Error")') # Error 9、列出字典 我们可以使用 Python enumerate() 函数将 List 转换为 Dictionary in One Line,在 enumerate() 中传递列表并使用 dict() 将最终输出转换为字典格式。 # Dictionary in One line mydict = ["John", "Peter", "Mathew", "Tom"] mydict = dict(enumerate(mydict)) print(mydict) # {0: 'John', 1: 'Peter', 2: 'Mathew', 3: 'Tom'} 10、多变量赋值 Python 允许在一行中进行多个变量赋值,下面的示例代码将向您展示如何做到这一点。 #Multi Line Variable #Normal Way x = 5 y = 7 z = 10 print(x , y, z) # 5 7 10 #One Line way a, b, c = 5, 7, 10 print(a, b, c) # 5 7 10 11、交换 交换是编程中一项有趣的任务,并且总是需要第三个变量名称 temp 来保存交换值。 这个单行代码段将向您展示如何在没有任何临时变量的情况下交换一行中的值。 #Swap in One Line #Normal way v1 = 100 v2 = 200 temp = v1 v1 = v2 v2 = temp print(v1, v2) # 200 100 # One Line Swapping v1, v2 = v2, v1 print(v1, v2) # 200 100 12、排序 排序是编程中的一个普遍问题,Python 有许多内置的方法来解决这个排序问题,下面的代码示例将展示如何在一行中进行排序。 # Sort in One Line mylist = [32, 22, 11, 4, 6, 8, 12] # method 1 mylist.sort() print(mylist) # # [4, 6, 8, 11, 12, 22, 32] print(sorted(mylist)) # [4, 6, 8, 11, 12, 22, 32] 13、读取文件 不使用语句或正常读取方法,也可以正确读取一行文件。 #Read File in One Line #Normal Way with open("data.txt", "r") as file: data = file.readline() print(data) # Hello world #One Line Way data = [line.strip() for line in open("data.txt","r")] print(data) # ['hello world', 'Hello Python'] 14、类 类总是多线工作,但是在 Python 中,有一些方法可以在一行代码中使用类特性。 # Class in One Line #Normal way class Emp: def __init__(self, name, age): self.name = name self.age = age emp1 = Emp("Haider", 22) print(emp1.name, emp1.age) # Haider 22 #One Line Way #method 1 Lambda with Dynamic Artibutes Emp = lambda: None; Emp.name = "Haider"; Emp.age = 22 print(Emp.name, Emp.age) # Haider 22 #method 2 from collections import namedtuple Emp = namedtuple('Emp', ["name", "age"]) ("Haider", 22) print(Emp.name, Emp.age) # Haider 22 15、分号 一行代码片段中的分号将向您展示如何使用分号在一行中编写多行代码。 # Semi colon in One Line #example 1 a = "Python"; b = "Programming"; c = "Language"; print(a, b, c) #output: # Python Programming Language 16、打印 这不是很重要的 Snippet,但有时当您不需要使用循环来执行任务时它很有用。 # Print in One Line #Normal Way for x in range(1, 5): print(x) # 1 2 3 4 #One Line Way print(*range(1, 5)) # 1 2 3 4 print(*range(1, 6)) # 1 2 3 4 5 17、Map 函数 Map 函数是适用的高阶函数,这将函数应用于每个元素,下面是我们如何在一行代码中使用 map 函数的示例。 #Map in One Line print(list(map(lambda a: a + 2, [5, 6, 7, 8, 9, 10]))) #output # [7, 8, 9, 10, 11, 12] 18、删除列表中的 Mul 元素 您现在可以使用 del 方法在一行代码中删除 List 中的多个元素,只需稍作修改。 # Delete Mul Element in One Line mylist = [100, 200, 300, 400, 500] del mylist[1::2] print(mylist) # [100, 300, 500] 19、打印图案 现在您不再需要使用 Loop 来打印相同的图案,您可以使用 Print 语句和星号 (*) 在一行代码中执行相同的操作。 # Print Pattern in One Line # Normal Way for x in range(3): print('😀') # output # 😀 😀 😀 #One Line way print('😀' * 3) # 😀 😀 😀 print('😀' * 2) # 😀 😀 print('😀' * 1) # 😀 20、查找质数 此代码段将向您展示如何编写单行代码来查找范围内的质数。 # Find Prime Number print(list(filter(lambda a: all(a % b != 0 for b in range(2, a)), range(2,20)))) #Output # [2, 3, 5, 7, 11, 13, 17, 19]

    student info management

    # 学生信息放在字典里面 student_info = [ {'姓名': '婧琪', '语文': 60, '数学': 60, '英语': 60, '总分': 180}, {'姓名': '巳月', '语文': 60, '数学': 60, '英语': 60, '总分': 180}, {'姓名': '落落', '语文': 60, '数学': 60, '英语': 60, '总分': 180}, ] # 死循环 while True # 源码自取君羊:708525271 while True: print(msg) num = input('请输入你想要进行操作: ') # 进行判断, 判断输入内容是什么, 然后返回相应结果 if num == '1': name = input('请输入学生姓名: ') chinese = int(input('请输入语文成绩: ')) math = int(input('请输入数学成绩: ')) english = int(input('请输入英语成绩: ')) score = chinese + math + english # 总分 student_dit = { # 把信息内容, 放入字典里面 '姓名': name, '语文': chinese, '数学': math, '英语': english, '总分': score, } student_info.append(student_dit) # 把学生信息 添加到列表里面 elif num == '2': print('姓名\t\t语文\t\t数学\t\t英语\t\t总分') for student in student_info: print( student['姓名'], '\t\t', student['语文'], '\t\t', student['数学'], '\t\t', student['英语'], '\t\t', student['总分'], ) elif num == '3': name = input('请输入查询学生姓名: ') for student in student_info: if name == student['姓名']: # 判断 查询名字和学生名字 是否一致 print('姓名\t\t语文\t\t数学\t\t英语\t\t总分') print( student['姓名'], '\t\t', student['语文'], '\t\t', student['数学'], '\t\t', student['英语'], '\t\t', student['总分'], ) break else: print('查无此人, 没有{}学生信息!'.format(name)) elif num == '4': name = input('请输入删除学生姓名: ') for student in student_info: if name == student['姓名']: print('姓名\t\t语文\t\t数学\t\t英语\t\t总分') print( student['姓名'], '\t\t', student['语文'], '\t\t', student['数学'], '\t\t', student['英语'], '\t\t', student['总分'], ) choose = input(f'是否确定要删除{name}信息(y/n)') if choose == 'y' or choose == 'Y': student_info.remove(student) print(f'{name}信息已经被删除!') break elif choose == 'n' or choose == 'N': break else: print('查无此人, 没有{}学生信息!'.format(name)) elif num == '5': print('修改学生信息') name = input('请输入删除学生姓名: ') for student in student_info: if name == student['姓名']: print('姓名\t\t语文\t\t数学\t\t英语\t\t总分') print( student['姓名'], '\t\t', student['语文'], '\t\t', student['数学'], '\t\t', student['英语'], '\t\t', student['总分'], ) choose = input(f'是否要修改{name}信息(y/n)') if choose == 'y' or choose == 'Y': name = input('请输入学生姓名: ') chinese = int(input('请输入语文成绩: ')) math = int(input('请输入数学成绩: ')) english = int(input('请输入英语成绩: ')) score = chinese + math + english # 总分 student['姓名'] = name student['语文'] = chinese student['数学'] = math student['英语'] = english student['总分'] = score print(f'{name}信息已经修改了!') break elif choose == 'n' or choose == 'N': # 跳出循环 break else: print('查无此人, 没有{}学生信息!'.format(name))

    if else 升级新语法

    Python 从 if else 优化到 match case Python 是一门非常重 if else 的语言 以前 Python 真的是把 if else 用到了极致,比如说 Python 里面没有三元运算符( xx ? y : z ) 无所谓,它可以用 if else 整一个。 x = True if 100 > 0 else False离谱的事还没有完,if else 这两老六还可以分别与其它语法结合,其中又数 else 玩的最野。 a: else 可以和 try 玩到一起,当 try 中没有引发异常的时候 else 块会得到执行。 #!/usr/bin/env python3 # -*- coding: utf8 -*- def main(): try: # ... pass except Exception as err: pass else: print("this is else block") finally: print("finally block") if __name__ == "__main__": main()b: else 也可以配合循环语句使用,当循环体中没有执行 break 语句时 else 块能得到执行。 #!/usr/bin/env python3 # -*- coding: utf8 -*- def main(): for i in range(3): pass else: print("this is else block") while False: pass else: print("this is else block") if __name__ == "__main__": main() c: if 相对来说就没有 else 那么多的副业;常见的就是列表推导。 以过滤出列表中的偶数为例,传统上我们的代码可能是这样的。 #!/usr/bin/env python3 # -*- coding: utf8 -*- def main(): result = [] numers = [1, 2, 3, 4, 5] for number in numers: if number % 2 == 0: result.append(number) print(result) if __name__ == "__main__": main()使用列表推导可以一行解决。 #!/usr/bin/env python3 # -*- coding: utf8 -*- def main(): numers = [1, 2, 3, 4, 5] print( [_ for _ in numers if _ % 2 == 0] ) if __name__ == "__main__": main()看起来这些增强都还可以,但是对于类似于 switch 的这些场景,就不理想了。 没有 switch 语句 if else 顶上 对于 Python 这种把 if else 在语法上用到极致的语言,没有 switch 语句没关系的,它可以用 if else !!! #!/usr/bin/env python3 # -*- coding: utf8 -*- def fun(times): """这个函数不是我们测试的重点这里直接留白 Parameter --------- times: int """ pass def main(case_id: int): """由 case_id 到调用函数还有其它逻辑,这里为了简单统一处理在 100 * case_id Parameter --------- times: int """ if case_id == 1: fun(100 * 1) elif case_id == 2: fun(100 * 2) elif case_id == 3: fun(100 * 3) elif case_id == 4: fun(100 * 4) if __name__ == "__main__": main(1)这个代码写出来大家应该发现了,这样的代码像流水账一样一点都不优雅,用 Python 的话来说,这个叫一点都不 Pythonic其它语言不好说,对于 Python 来讲不优雅就是有罪。 前面铺垫了这么多,终于快到重点了。 社区提出了一个相对优雅的写法,新写法完全不用 if else 。 #!/usr/bin/env python3 # -*- coding: utf8 -*- def fun(times): pass # 用字典以 case 为键,要执行的函数对象为值,这样做到按 case 路由 routers = { 1: fun, 2: fun, 3: fun, 4: fun } def main(case_id: int): routers[case_id](100 * case_id) if __name__ == "__main__": main(1)可以看到新的写法下,代码确实简洁了不少;从另一个角度来看社区也完成了一次进化,从之前抱着 if else 这个传家宝不放,到完全不用 if else 。 也算是非常有意思吧。 新写法也不是没有问题;性能!性能!还是他妈的性能不行! if else 和宝典写法性能测试 在说测试结果之前,先介绍一下我的开发环境,腾讯云的虚拟机器,Python 版本是 Python-3.12.0a3 。 测试代码会记录耗时和内存开销,耗时小的性能就好。 详细的代码如下。 #!/usr/bin/env python3 # -*- coding: utf8 -*- import timeit import tracemalloc tracemalloc.start() def fun(times): """这个函数不是我们测试的重点这里直接留白 Parameter --------- times: int """ pass # 定义 case 到 操作的路由字典 routers = { 1: fun, 2: fun, 3: fun, 4: fun } def main(case_id: int): """用于测试 if else 写法的耗时情况 Parametr -------- case_id: int 不同 case 的唯一标识 Return ------ None """ if case_id == 1: fun(100 * 1) elif case_id == 2: fun(100 * 2) elif case_id == 3: fun(100 * 3) elif case_id == 4: fun(100 * 4) def main(case_id: int): """测试字典定法的耗时情况 Parametr -------- case_id: int 不同 case 的唯一标识 Return ------ None """ routers[case_id](100 * case_id) if __name__ == "__main__": # 1. 记录开始时间、内存 # 2. 性能测试 # 3. 记录结束时间和总的耗时情况 start_current, start_peak = tracemalloc.get_traced_memory() start_at = timeit.default_timer() for i in range(10000000): main((i % 4) + 1) end_at = timeit.timeit() cost = timeit.default_timer() - start_at end_current, end_peak = tracemalloc.get_traced_memory() print(f"time cost = {cost} .") print(f"memery cost = {end_current - start_current}, {end_peak - start_peak}")下面直接上我在开发环境的测试结果。 文字版本。 可以看到字典写法虽然优雅了一些,但是它在性能上是不行的。 故事讲到这里,我们这次的主角要上场了。 match case 新语法 Python-3.10 版本引入了一个新的语法 match case ,这个新语法和其它语言的 switch case 差不多。 在性能上比字典写法好一点,在代码的优雅程度上比 if else 好一点。 大致语法像这样。 match xxx: case aaa: ... case bbb: ... case ccc: ... case ddd: ... 光说不练,假把式!改一下我们的测试代码然后比较一下三者的性能差异。 #!/usr/bin/env python3 # -*- coding: utf8 -*- import timeit import tracemalloc tracemalloc.start() def fun(times): """这个函数不是我们测试的重点这里直接留白 Parameter --------- times: int """ pass # 定义 case 到 操作的路由字典 routers = { 1: fun, 2: fun, 3: fun, 4: fun } def main(case_id: int): """用于测试 if else 写法的耗时情况 Parametr -------- case_id: int 不同 case 的唯一标识 Return ------ None """ if case_id == 1: fun(100 * 1) elif case_id == 2: fun(100 * 2) elif case_id == 3: fun(100 * 3) elif case_id == 4: fun(100 * 4) def main(case_id: int): """测试字典定法的耗时情况 Parametr -------- case_id: int 不同 case 的唯一标识 Return ------ None """ routers[case_id](100 * case_id) def main(case_id: int): """测试 match case 写法的耗时情况 Parametr -------- case_id: int 不同 case 的唯一标识 Return ------ None """ match case_id: case 1: fun(100 * 1) case 2: fun(100 * 2) case 3: fun(100 * 3) case 4: fun(100 * 4) if __name__ == "__main__": # 1. 记录开始时间、内存 # 2. 性能测试 # 3. 记录结束时间和总的耗时情况 start_current, start_peak = tracemalloc.get_traced_memory() start_at = timeit.default_timer() for i in range(10000000): main((i % 4) + 1) end_at = timeit.timeit() cost = timeit.default_timer() - start_at end_current, end_peak = tracemalloc.get_traced_memory() print(f"time cost = {cost} .") print(f"memery cost = {end_current - start_current}, {end_peak - start_peak}") 可以看到 match case 耗时还是比较理想的。 详细的数据如下。

    20 Python libraries

    Requests. The most famous http library written by Kenneth Reitz. It's a must have for every python developer. Scrapy. If you are involved in webscraping then this is a must have library for you. After using this library you won't use any other. wxPython. A gui toolkit for python. I have primarily used it in place of tkinter. You will really love it. Pillow. A friendly fork of PIL (Python Imaging Library). It is more user friendly than PIL and is a must have for anyone who works with images. SQLAlchemy. A database library. Many love it and many hate it. The choice is yours. BeautifulSoup. I know it's slow but this xml and html parsing library is very useful for beginners. Twisted. The most important tool for any network application developer. It has a very beautiful api and is used by a lot of famous python developers. NumPy. How can we leave this very important library ? It provides some advance math functionalities to python. SciPy. When we talk about NumPy then we have to talk about scipy. It is a library of algorithms and mathematical tools for python and has caused many scientists to switch from ruby to python. matplotlib. A numerical plotting library. It is very useful for any data scientist or any data analyzer. Pygame. Which developer does not like to play games and develop them ? This library will help you achieve your goal of 2d game development. Pyglet. A 3d animation and game creation engine. This is the engine in which the famous python port of minecraft was made pyQT. A GUI toolkit for python. It is my second choice after wxpython for developing GUI's for my python scripts. pyGtk. Another python GUI library. It is the same library in which the famous Bittorrent client is created. Scapy. A packet sniffer and analyzer for python made in python. pywin32. A python library which provides some useful methods and classes for interacting with windows. nltk. Natural Language Toolkit – I realize most people won’t be using this one, but it’s generic enough. It is a very useful library if you want to manipulate strings. But it's capacity is beyond that. Do check it out. nose. A testing framework for python. It is used by millions of python developers. It is a must have if you do test driven development. SymPy. SymPy can do algebraic evaluation, differentiation, expansion, complex numbers, etc. It is contained in a pure Python distribution. IPython. I just can’t stress enough how useful this tool is. It is a python prompt on steroids. It has completion, history, shell capabilities, and a lot more. Make sure that you take a look at it.

    Python实现文本自动播读

    用Python代码实现文本自动播放功能,主要有5步。第一步: 导入需要的依赖库。这里面主要用到两个库: (1)requests库: 作用是利用百度接口将文本解析为音频(2)os库: 用于播放音频 第二步: 获取百度接口将文本解析的access_token。 主要是通过requests库的get方法获取access_token。 备注: ak和sk需要在百度云注册获取,请小伙伴们别忘了! 第三步:将文本解析为音频流。主要是通过requests库的get方法获取文本对应的音频流。 第四步:将解析的音频流保存。主要是通过文件操作的write方法将文本对应的音频流写入保存。 第五步: 播放保存的音频流文件。 具体的代码

    Python速查表

    基础

    神经网络

    线性代数

    python基础

    scipy科学计算

    spark

    数据保存及可视化

    numpy

    pandas

    bokeh

    画图

    matplotlib

    ggplot

    机器学习

    sklearn

    keras

    tensorflow

    算法

    数据结构

    复杂度

    排序算法

    欧拉定理和图论

    图式理论起源于18世纪,有一个有趣的故事。 柯尼斯堡是历史上普鲁士(今俄罗斯)的一个城市,有7座桥梁横跨普雷格尔河。 有人问: 有没有可能绕着柯尼斯堡走一圈,正好穿过每座桥一次? 请注意,我们正好在开始的地方完成,这并不重要。 我们将在学习了一些术语后再来讨论这个问题。 Graph图是一种数学结构,由以下部分组成。 顶点(也叫节点或点)(V),它们通过以下方式相连边(也叫链接或线)(E)它被表示为G = (V, E)Degree一个特定顶点的边的数量被称为它的degree 一个有6个顶点和7条边的图图在计算机科学中通常用来描述不同对象之间的关系。 例如,Facebook作为一个图表示不同的人(顶点)和他们的关系(边)。 同样地,维基百科的编辑(边)在夏天的一个月里对不同的维基百科语言版本(顶点)做出了贡献,可以被描述为一个图,如下所示。 图片来自维基百科以上面的柯尼斯堡为例,这个城市与河流及其桥梁可以用图示来描述,如下所示。 欧拉首先用图形表示上述图表,如下图所示。 他用一个顶点或节点来描述每块土地,用一条边来描述每座桥。 柯尼斯堡的普雷格尔河的图形表示(图片来自维基百科)欧拉提出了一个定理,指出: 如果除了最多两座桥之外,所有的桥都有一个 "偶数度",那么一个城市的桥就可以准确地被穿越一次。 看一下代表柯尼斯堡的图,每个顶点都有一个奇数,因此不可能绕着城市走,准确地穿过每座桥一次。 这个定理催生了现代图论,即对图的研究。 图和类型 有向图/二维图一个边有方向性的图。 这意味着,一条边只能在一个方向上穿过。 例如,一个代表Medium通讯和其订阅者的图。 无向图一个边没有方向的图。 这意味着一条边可以双向穿越。 例如,一个代表Facebook上朋友之间关系的图。 循环循环是一个图形,它的一些顶点(至少3个)以封闭链的形式连接。 循环图它是一个至少有一个周期的图。 一个有向循环图非循环图它是一个没有循环的图。 连接图它是一个具有从任何顶点到另一个顶点的边的图形。 它可以是: 强连接: 如果所有顶点之间存在任何双向的边连接弱连接: 如果所有顶点之间没有双向的连接 无连接的图形 一个没有连接顶点的图形被称为断开连接的图形。 中心性算法中心性算法可用于分析整个图,以了解该图中的哪些节点对网络的影响最大。 然而, 要用算法衡量网络中节点的影响力,我们必须首先定义“影响力”在图上的含义。 这因算法而异,并且在尝试决定选择哪种中心性算法时是一个很好的起点。 度中心性使用节点的平均度来衡量它对图的影响有多大Closeness Centrality使用 给定节点与所有其他节点之间的反距离距离来了解节点在图中的中心程度Betweenness Centrality使用最短路径来确定哪些节点充当图中的中心“桥梁”,以识别网络中的关键瓶颈PageRank使用一组随机游走来衡量给定节点对网络的影响力。 通过测量哪些节点更有可能在随机游走中被访问。 请注意,PageRank 通过偶尔跳到图中的随机点而不是直接跳跃来解决随机游走面临的断开连接的图问题。 这允许算法探索图中甚至断开连接的部分。 PageRank 以谷歌创始人拉里佩奇的名字命名,被开发为谷歌搜索引擎的支柱,并使其在互联网的早期阶段超越了所有竞争对手的表现。 度中心性使用节点的平均度来衡量它对图的影响有多大Closeness Centrality使用 给定节点与所有其他节点之间的反距离距离来了解节点在图中的中心程度。 寻路和搜索算法 另一个基础图算法家族是图最短路径算法。 正如我们在关于图遍历算法(又名寻路算法)的文章中探讨的那样,最短路径算法通常有两种形式,具体取决于问题的性质以及您希望如何探索图以最终找到最短路径。 深度优先搜索,首先尽可能深入地遍历图形,然后返回起点并进行另一次深度路径遍历广度优先搜索,使其遍历尽可能靠近起始节点,并且只有在耗尽最接近它的所有可能路径时才冒险深入到图中寻路被用在许多用例中,也许最著名的是谷歌地图。 在 GPS 的早期,谷歌地图使用图表上的寻路来计算到达给定目的地的最快路线。 这只是无数人使用图表解决日常问题的众多例子之一。 图数据科学中的深度优先搜索和广度优先搜索示例维基百科对Dijkstra 算法的说明 深度优先搜索深度优先搜索,首先尽可能深入地遍历图形,然后返回起点并进行另一次深度路径遍历广度优先搜索,使其遍历尽可能靠近起始节点,并且只有在耗尽最接近它的所有可能路径时才冒险深入到图中。 寻路被用在许多用例中,也许最著名的是谷歌地图。 在 GPS 的早期,谷歌地图使用图表上的寻路来计算到达给定目的地的最快路线。 这只是无数人使用图表解决日常问题的众多例子之一。 深度优先搜索(DFS)是一种搜索图数据结构的算法。 该算法从根节点开始,在回到起点之前尽可能地沿着每个分支进行探索。 深度优先搜索可以在Python中定义如下。 这里我们定义了一个Node类,其构造函数定义了它的子节点(连接的顶点)和名称。 addChild方法向节点添加新的子节点。 depthFirstSeach方法递归地实现了深度优先搜索算法。 class Node: def __init__(self, name): self.children = [] self.name = name def addChild(self, name): self.children.append(Node(name)) return self def depthFirstSearch(self, array): array.append(self.name) for child in self.children: child.depthFirstSearch(array) return array 广度优先搜索 广度优先搜索(BFS)是另一种搜索图数据结构的算法。 Breadth-first search 它从根节点开始,在继续搜索其他分支的节点之前,探索目前分支的所有节点。 该算法可以用Node类的breadthFirstSearch方法定义如下。 class Node: def __init__(self, name): self.children = [] self.name = name def addChild(self, name): self.children.append(Node(name)) return self def breadthFirstSearch(self, array): # Write your code here. queue = [self] while len(queue)> 0: current = queue.pop(0) array.append(current.name) for child in current.children: queue.append(child) return array 图算法: 令人惊讶的用例多样性解释 荐书 附文: 图算法家谱补充肖恩·罗宾逊 (Sean Robinson),MS / 首席数据科学家作者: Sean Robinson,MS / 首席数据科学家 图算法家谱 社区检测算法社区检测是各种图形的常见用例。 通常,它用于理解图中不同节点组为用例提供一些有形价值的任何情况。 这可以是社交网络中的任何东西,从运送货物的卡车车队到相互交易的账户网络。 但是,您选择哪种算法来发现这些社区将极大地影响它们的分组方式。 Triangle Count简单地使用了三个完全相互连接的节点(如三角形)的原理,这是图中可以存在的最简单的社区动态。 因此,它会找到图中三角形的每个组合,以确定这些节点如何组合在一起强连通分量和连通分量(又名弱连通分量)是确定图形形状的优秀算法。 两者都旨在衡量有多少图表构成了全部数据。 连通分量仅返回一组节点和边中完全断开连接的图的数量,而强连通分量返回那些通过许多链接牢固连接的子图。 正因为如此,在首次分析图形数据时,它们通常被组合用作初始探索性数据分析的一种形式Louvain Modularity通过将节点和边的集群与网络的平均值进行比较来找到社区。 如果发现一组节点通常大于图中看到的平均数,则这些节点可以被视为一个社区。 结论在本文中,作为一种图算法备忘单,我们只是触及了数据科学中最常见的图算法(又名图算法)的皮毛,这些算法可用于利用图必须为数据提供的互连功能分析。 例如,在未来的 artciels 中,我们还将更多地关注图搜索算法等。 我们研究了最基本的图论算法,它们作为更复杂的图算法的构建块,并检查了那些可以解决许多用例的各种问题的复杂算法。 无论是 Neo4j 图数据库算法还是任何其他图数据库,都是如此。

    10个最难的 Python 概念

    了解 Python 中 OOP、装饰器、生成器、多线程、异常处理、正则表达式、异步/等待、函数式编程、元编程和网络编程的复杂性 这些可以说是使用 Python 学习最困难的概念。 当然,对某些人来说可能困难的事情对其他人来说可能更容易。 面向对象编程 (OOP):对于初学者来说,理解类、对象、继承和多态性的概念可能很困难,因为它们可能是抽象的。 OOP 是一种强大的编程范式,允许组织和重用代码,并广泛用于许多 Python 库和框架中。 例子:创造一个狗的类: class Dog: def __init__(self, name, breed): self.name = name self.breed = breed def bark(self): print("Woof!") my_dog = Dog("Fido", "Golden Retriever") print(my_dog.name) # "Fido" my_dog.bark() # "Woof!":“汪!” 装饰器: 装饰器可能很难理解,因为它们涉及函数对象和闭包的操作。 装饰器是 Python 的一个强大特性,可用于为现有代码添加功能,常用于 Python 框架和库中。 例子: def my_decorator(func): def wrapper(): print("调用func之前.") func() print("调用func之后.") return wrapper @my_decorator def say_whee(): print("Whee!") say_whee() 生成器表达式和 yield: 理解生成器函数和对象是处理大型数据集的一种强大且节省内存的方法,但可能很困难,因为它们涉及迭代器的使用和自定义可迭代对象的创建。 例子:生成器函数 # generator function def my_gen (): n = 1 print ( 'This is first printed' ) yield n n += 1 print ( 'This is printed second' ) yield n n += 1 print ( 'This is printed first ' ) yield n #在my_gen()中 #使用 for 循环 for item : print (item) 多线程:多线程可能很难理解,因为它涉及同时管理多个执行线程,这可能很难协调和同步。 例子: import threading def worker (): """thread worker function""" print (threading.get_ident()) threads = [] for i in range ( 5 ): t = threading.Thread(target=worker) threads.append( t) t.start() 异常处理: 异常处理可能难以理解,因为它涉及管理和响应代码中的错误和意外情况,这可能是复杂和微妙的。 例子: try: x = 1 / 0 except ZeroDivisionError as e: print("Error Code:", e) Case 2nd:使用raise_for_status(),当你对API的调用不成功时,引发一个异常: import requests response = requests.get("https://google.com" response.raise_for_status() print(response.text) # <!doctype html><html itemscope=""itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="text ... response = requests.get("https://google.com/not-found") response.raise_for_status() # requests.exceptions.HTTPError:404 Client Error: Not Found for url: https://google.com/not-found 正则表达式:正则表达式可能难以理解,因为它们涉及用于模式匹配和文本操作的专门语法和语言,这可能很复杂且难以阅读。 例子: import re string = "The rain in Spain" x = re.search( "^The.*Spain$" , string ) if x: print ( "YES! We have a match!" ) else : print ( "No match" ) 异步/等待:异步和等待可能很难理解,因为它们涉及非阻塞 I/O 和并发的使用,这可能很难协调和同步。 例子: import asyncio async def my_coroutine (): print ( "我的协程" ) await my_coroutine() 函数式编程:函数式编程可能很难理解,因为它涉及一种不同的编程思维方式,使用不变性、一流函数和闭包等概念。 例子: from functools import reduce my_list = [1, 2, 3, 4, 5] result = reduce(lambda x, y: x*y, my_list) print(result) 元编程:元编程可能难以理解,因为它涉及在运行时对代码的操作,这可能是复杂和抽象造成的。 例子: class MyMeta(type): def __new__(cls, name, bases, dct): x = super().__new__(cls, name, bases, dct) x.attribute = "example" return x class MyClass(metaclass=MyMeta): pass obj = MyClass() print(obj.attribute) 网络编程: 网络编程可能很难理解,因为它涉及使用套接字和协议在网络上进行通信,这可能是复杂和抽象的。 例子: import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(("127.0.0.1", 3000)) s.listen()

    Python必备的常用命令行命令

    一、python环境相关命令 1、查看版本 python -V 或 python --version 2、查看安装目录 where python 二、pip命令 requirements.txt 此文件保存python安装的一些第三方库的信息,保证团队使用一样的版本号 升级pip python -m pip install --upgrade pip 1、安装第三方库 pip install <pacakage> or pip install -r requirements.txt 改换源镜像 pip install <package> -i https://pypi.tuna.tsinghua.edu.cn/simple 安装本地安装包(.whl包) pip install <目录>/<文件名> 或 pip install --use-wheel --no-index --find-links=wheelhous/<包名> 例如:pip install requests-2.21.0-py2.py3-none-any.whl (注意.whl包在C:\Users\Administrator中才能安装) 升级包 pip install -U <包名> 或: pip install <包名> --upgrade 例如: pip install urllib3 --upgrade 2、卸载安装包 pip uninstall <包名> 或 pip uninstall -r requirements.txt 例如: pip uninstall requests 3、查看已经安装的包及版本信息 pip freeze pip freeze > requirements.txt 4、查询已经安装了的包 pip list 查询可升级的包 pip list -o 5、显示包所在目录及信息 pip show <package> 如 pip show requests 6、搜索包 pip search <关键字> 例如: pip search requests就会显示如下和requests相关的安装包 pip install pip-search pip_search requests 7、打包 pip wheel <包名> 例如 pip wheel requests 在以下文件夹中就能找到requests-2.21.0-py2.py3-none-any.whl文件了

    Digital 时钟

    from tkinter import * from tkinter.ttk import * from time import strftime root = Tk() root.title('Clock') def time(): string = strftime('%H:%M:%S %p') lbl.config(text = string) lbl.after(1000, time) lbl = Label(root, font = ('franklin gothic', 40, 'bold'), background = 'black', foreground = 'white') lbl.pack(anchor = 'center') time() mainloop()

    进度条

    读取系统时间,展示今年的时间已经过去了多少个百分比 #1 读取当前时间和当年的元月一日计算天数 def time_printer(): current = datetime.datetime.now() start = datetime.datetime(current.year, 1, 1) daysCunt = (current - start).days return daysCunt #start,current, print(time_printer()) #2 导入 tqdm 模块 参数 ncols是进度条长度 desc:主题提示说明 from tqdm import tqdm def proccessingBar(): for i in tqdm(range(0,365),ncols=100,desc="今年已经过去 % ",): sleep(0.01) if i == time_printer(): return proccessingBar()

    顶级的python游戏库

    Pygame Pyglet Pyopengl Arcade Pandas3D PyOpenGL

    剖析 Python collections模块

    本文剖析 Python 的collections模块提供有用的数据类型,可以改进和缩短许多用例的代码。 我们将介绍和讨论五种最常用的类型: deque Counter defaultdict OrderedDict namedtuple collections 模块就是其中之一,我和许多其他开发人员经常使用这个功能强大的包。 我几乎每天都在工作中使用defaultdictCounter,它们产生了许多优雅的单行代码。 但OrderedDict没有。 当我在做Leetcode deque挑战时遇到困难时,它提供了很大的帮助——并且通常对构建代码和提高可读性很有用。 这种频繁的使用和强大的功能激发了我写这篇文章的灵感,我希望它也能在你的编码生活中变得有用——并且成为标准。 事不宜迟,让我们开始吧。 1、双向队列 DequeDeque代表“双端队列”,顾名思义,允许从任一端高效地输入/删除数据(与标准队列相比,标准队列使用 FIFO(先进先出)方法,即只允许插入结尾,并从开头删除)。 deque在 O(1) 中从队列的任一侧提供插入/删除操作。 这与在 Python 中使用本机列表形成对比,后者也可用于此目的。 但是,列表仅支持在(摊销)O(1) 的末尾插入/移除,而在开头的插入/移除是 O(n)。 此外,列表在底层实现为数组。 这会产生一些空间开销,因为数组被预先分配给特定的大小和时间。 因此,当空间用完时,必须将元素复制到不同的位置分摊O(1)。 相比之下,deques 被实现为双向链表,从而产生上述属性。 Python snippet-> from collections import deque ### Native list: 在开头添加/删除元素时不要使用 standard_list = [ 1 , 2 , 3 ] # 从结尾添加/删除:O(1) standard_list.append( 4 ) last_el = standard_list .pop() # 默认参数是 -1,最后一项 print ( f"Removed last element: {last_el} " ) # 从开头添加/删除:O(n) standard_list.insert( 0 , 0 ) # 语法:插入(pos, val) print ( f"移除第一个元素:{standard_list.pop( 0 )} " ) deque 更高效 Python snippet-> ### deque: 更高效 my_deque = deque([ 1 , 2 , 3 ]) # 添加/删除末尾:O(1) my_deque.append( 4 ) last_el = my_deque. pop() # pop() 从右侧弹出 print ( f"Removed last element: {last_el} " ) # 从开头添加/删除:O(1) my_deque.appendleft( 0 ) print ( f"Removed first element: { my_deque.popleft()} " ) CountersCounters本质上是dicts包含对象的快捷方式:元素的键映射到元素的计数。 Counter您可以像这样初始化一个新的: from collections import Counterimport Counter fruits = Counter({ 'apple' : 4 , 'pear' : 2 , 'orange' : 0 }) 然后我们可以更新我们的计数如下: fruits.update({ 'orange' : 2 , 'apple' : - 1 }) print ( f"My fruit collection: {fruits} " ) 在这里我们看到update()确实更新了计数(即,从我们的集合中减去一个苹果)而不是将元素数设置为给定值。 另一个有用的函数是most_common(n),它返回 n 个最频繁出现的元素: print ( f"我最常吃的两个水果:{fruits.most_common( 2 )} " ) 此外,Counter允许从任何合适的可迭代对象进行初始化,通常会产生紧凑的单行代码。 例如,我们可以从“原始”数据初始化它,如下所示: fruits = Counter([ "apple" , "apple" , "pear" , "apple" , "apple" , "pear" ]) 这也经常用于字符串,特别是计算字符的出现次数: from collections import Counter char_counts = Counter( "Hello world!" ) print ( f"字符数:{char_counts} " ) 2、defaultdict 指令defaultdict是在向字典添加新键时删除烦人的初始化代码的好方法。 它使用默认工厂进行初始化,每当发生这种情况时都会在内部调用它。 考虑以下非常常见的示例:您想为每个键存储一个值列表,因此,无论何时添加一个新的、尚不存在的键,都必须初始化空列表。 这就是它的样子: my_dict = {} key_value_pairs_to_insert = [( "a" , 0 ), ( "b" , 1 ), ( "a" , 2 )] for key, val in key_value_pairs_to_insert: if key not in my_dict: my_dict[key] = [ ] my_dict[key].append(val) print ( f"Resulting dict: {my_dict} ." ) 我们可以缩短,特别是删除“key not in dict”检查,使用defaultdict: from collections import defaultdict my_dict = defaultdict( list ) key_value_pairs_to_insert = [( "a" , 0 ), ( "b" , 1 ), ( "a" , 2 )] for key, val in key_value_pairs_to_insert: my_dict[key].append (val) print ( f"结果字典:{my_dict}。" ) 3、有序字典OrderedDict 引用官方文档,是“一个能记住添加的顺序元素的字典”,因此,它提供了以下功能: popitem(last=True), 它返回添加的最后一个或第一个项目 move_to_end(), 将所选项目移动到字典的末尾 让我们看一下 Python 代码: from collections import OrderedDict # 添加4个元素: ordered_dict = OrderedDict() ordered_dict[ "a" ] = 0 ordered_dict[ "b" ] = 1 ordered_dict[ "c" ] = 2 ordered_dict[ "d" ] = 3 弹出第一个和最后一个元素: # pop first and last item print ( f"Pop first element: {ordered_dict.popitem(last= False )} " ) print ( f"Pop last element: {ordered_dict.popitem()} " ) 交换 "b" / " 的插入顺序 # 交换 "b" / " 的插入顺序 ordered_dict.move_to_end( "b" ) print ( f"弹出最后一个元素: {ordered_dict.popitem()} " ) 题外话,请注意这个数据结构为Leetcode 著名的 LRU 缓存问题提供了一个简单的解决方案(当然,理解底层原理仍然很好,即在内部这个结构是作为一个指向双精度的字典实现的- 节点链表)。 在此问题中,您的任务是实现“LRU(最近最少使用)缓存”,这意味着缓存仅保留最近使用的 N 个元素并在 O(1) 中提供插入/删除运算符。 示例解决方案可以像这样简短: from collections import OrderedDict class LRUCache ( OrderedDict ): def __init__ ( self, capacity: int ): self.capacity = capacity def get ( self, key: int ) -> int : if key in self: self.move_to_end(key) return self[key] else : return - 1 def put ( self, key: int , value: int ) ->None : if key in self: self.move_to_end(key) self[key] = value if len (self) > self.capacity: self.popitem(last= False ) 4、命名元组 namedtuple 元组tuple是 Python 中重要的数据结构,并且是list的不可变等价物。 由于这个属性,它们为开发人员提供了辨识内部元素不可变的特性(指示const性),使用起来更快,内存效率更高。 因此,建议尽可能在列表上使用它们。 请注意,return x, y, z实际上已经返回了一个元组,而不是列表——可能是这个原因,也许是函数以元组形式返回若干个值不易被误修改?使用它们的一个缺点(然而,这也适用于列表)是数据访问的匿名性和相应的容易出错。 例如t[n]访问第 n 个元素,问题通过 namedtuple来缓解,顾名思义,它允许命名元素的数据类。 namedtuple可以按如下方式使用: from collections import namedtuple Point = namedtuple( 'Point' , [ 'x' , 'y' ]) p = Point( 0 , y= 1 ) print( f"x: {px} , {p[ 1 ]} " ) 正如我们所见,我们首先声明了一个新的类型namedtuplenamed Point,其属性为x和y。 链接👇介绍运用例子: 数据结构:namedtuple应用场景 引入适用场景 测量两个点的距离,一个点描述为两个数字x,y,作为单独的对象来处理。 点经常被写在括号里,用逗号隔开方向。 例如,(0, 0)是起始点,(x, y)是向右移动x个单位,从起始点向上移动y个单位的指南。 正常任务可能是确定一个点与起点的距离,或与另一个点的距离,或发现两个点的中点,或询问一个点是否落在一个给定的方形或圆形内。 我们会在眨眼间察觉到答案。 如何运用数学描述将这些与信息协调起来?在Python中解决一个点的一个特征方法,简单的安排是利用一个元组,对于某些应用来说,这可能是个不错的决定。 另一个选择是描述另一个类。 这种方法包括更多的努力,然而,它的好处很快就会显现出来。 我们需要我们的焦点有一个x和一个y的特性,所以我们的顶层定义类似于这样。 有3个任务场景 代码见链接 👇然后我们实例化该类型的一个新实例,并可以在构造函数中使用位置参数或命名参数。 此外,为了访问属性,我们可以使用属性的名称或简单的索引,类似于普通元组。

    30+ snippet



    补充上一期任务中用到set()的时间复杂度问题。 具体问题见下面的链接👇 计算思维第25篇 无处不在的启发 set()它是一个哈希表,实现方式与Python的dict非常相似,有一些优化,利用了值总是空的事实,在一个集合中,我们只关心键。 有同学问到集合操作确实需要对至少一个操作数表进行迭代 ,在union的情况下都是如此。 迭代时间复杂度O(n),成员测试平均是O(1) 所以对于两个大小为m和n的集合,操作的平均成本为。 合并:O(m+n) 交集:O(min(m,n)) 差集:O(m) 子集:O(m) 使用虚拟环境:这表明您隔离环境以避免依赖项和包版本出现问题。 此外,如果我们坚持要求和约束文件,这表明我们关心应用程序应该如何在另一个地方运行: pip install -r requirements.txt -c constraints.txt 使用变量的类型提示:类型提示使我们的代码更具可读性。 这有助于简化维护和调试。 def demo(x: type_x, y: type_y, z: type_z= 100) -> type_return : 正确的异常处理: 捕获特定的异常可以很容易地理解我们的代码中出了什么问题。 Bareexcept子句捕获所有异常,包括SystemExitKeyboardInterrupt。 但两者并不相同,应区别对待。 Try 和 Except Python 中异常处理的高级技术 作为 Python 开发人员,您可能熟悉使用 try 和 except 语句处理异常的基本方法。 但是您是否知道可以使用其他技术来使您的异常处理更加强大? 在本文中,我们将超越基础知识,探索一些在 Python 中处理异常的高级技术。 首先,让我们看一下finally声明。 此语句允许您指定无论是否引发异常都将执行的代码块。 例如,假设您有一个文件,您需要在使用完它后将其关闭。 您可以使用 try-except 块来捕获在处理文件时可能发生的任何错误,但如果文件未正确关闭怎么办?这就是 finally 语句出现的地方。 这是一个例子: try: my_file = open("my_file.txt", "r") # do some file operations except FileNotFoundError: print("File not found.") finally: my_file.close() 在此示例中,无论如何都会关闭文件,无论是否引发异常。 这在处理需要清理的资源(如文件、数据库连接和网络套接字)时特别有用。 另一种高级技术是使用多个except 块来处理不同类型的异常。 例如,您可能希望以不同于PermissionError的方式处理FileNotFoundError。 这是一个例子: try: my_file = open("my_file.txt", "r") # do some file operations except FileNotFoundError: print("File not found.") except PermissionError: print("Permission denied.") 在此示例中,如果引发 FileNotFoundError,将执行第一个 except 块,如果引发 PermissionError,将执行第二个 except 块。 这允许您以更具体和更有针对性的方式处理不同类型的异常。 最后,您还可以使用该语句引发您自己的异常raise。 例如,您可能希望在未满足特定条件时引发异常。 这是一个例子: def divide(a, b): if b == 0: raise ZeroDivisionError("Cannot divide by zero.") return a / b try: result = divide(5, 0) print(result) except ZeroDivisionError as e: print(e) 在此示例中,如果除数为零,则会引发 ZeroDivisionError 并显示消息“不能被零除”。 这对于发出未满足特定条件并且程序无法继续的信号很有用。 总之,除了基本的 try 和 except 语句之外,Python 中还有许多用于异常处理的高级技术。 通过使用finally语句、多个 except 块和 raise 语句,您可以使您的异常处理更加强大和具体,并且您的代码更加健壮和可维护。 除了基本的 try 和 except 语句之外,还有一些更高级的 Python 异常处理技术 使用else子句:else子句可以与 try-except 块结合使用,它允许您指定只有在 try 块中没有引发异常时才会执行的代码块。 使用assert语句:assert语句可用于检查代码中的某些条件,如果不满足条件则引发异常。 这对于在代码中尽早发现错误很有用,以免它们导致更严重的问题。 使用with语句:with语句可用于自动处理资源的设置和清理,例如文件和网络连接。 这可以使您的异常处理更加简洁和易于阅读。 debug 变量 可能是我们不能在 Python 中重新分配的唯一变量 该__debug__变量是一个布尔值,当我们正常运行 Python 脚本时通常为 True。 但是,如果我们使用-O标志运行我们的 Python 脚本,__debug__则设置为 Falsepython3 -O yourscript.py__debug__为 False 时,assert 语句将被忽略。 assert 1==2 ^ 如果我们正常运行上面的代码,我们会得到一个 AssertionError。 但是,如果我们使用-O标志运行它,__debug__设置为 False,并且 assert 语句将被忽略。 使用自定义异常类:您可以通过对内置Exception类进行子类化来创建自己的自定义异常类。 这允许您创建可以在代码中以不同方式处理的更具体的异常。 记录异常:您可以使用日志记录来跟踪和记录代码中发生的异常。 这对于调试和故障排除以及监控应用程序的运行状况很有用。 使用装饰器进行异常处理:您可以使用装饰器将函数与 try-except 块包装起来,以更优雅的方式处理异常,而不是为每个函数单独添加 try-except 块。 使用 contextlib 库:Python 的 contextlib 库提供了几个上下文管理实用程序,包括 contextmanager 装饰器,它可用于为需要设置和拆除的资源定义上下文管理器。 使用上下文管理器:这有助于资源管理并避免悬空连接或文件句柄。 with open("demo.txt", mode="w") as file: 使用if __name__ == "main" if __name__ == "main": 这可以避免在导入脚本时使用不必要的对象。 使用理解推导式:理解使代码不那么冗长和高效。 another_list = [ x for item in iterable] 枚举enumerate: 当我们不需要有条件地跳过枚举元素时手动处理索引很容易工作并且速度更快。 for idx, num in enumerate(nums) "".join(Iterable)字符串连接高频用法,用到烂熟! 使用 zip:在使用多个列表 zip 方法时,可以简化迭代,并且项目可以直接开箱即用。 for item1, item2 in zip(list1, list2) 正确缩进:遵循一致的缩进(四个空格)有助于提高可读性。 此外,如果脚本是从文本编辑器或文件 io 流中读取的,它会使用更轻松。 for ... else ...elsefor & while 一起使用:这是告诉所有项目都在其自然过程中处理的pythonic方式。 实际运用例子的链接见👇 自学编程24:For else循环判断,Switch-Case Statements Coming 新手上路 for ... else ... 运用中的细微差别 使用in关键字检查字典键中的成员资格。 if key in my_dict: items在处理字典时使用该方法:这是一种更 pythonic 的访问键和项目的方式,而不是循环访问键,然后使用get或下标访问值。 for key, val in my_dict.items() 使用dict.get()方法:这避免了键不存在时的错误,并且还允许我们在键不存在时返回默认值。 isinstance在进行对象类型相等时使用关键字:在继承对象的情况下,比较对象类型将给出错误的结果。 运算is符检查这两个项目是否引用相同的对象(即存在于相同的内存位置)。 运算==符检查值 是否相等。 using is for equality 与 singleton: Using is to check None, True & False. 使用布尔值关键字: 这有助于简捷定义正确的操作流程 “or”、“and”、“not” 使用OR中检查不满足if判断条件时,会继续检查下一个,直到有满足时返回,否则检查所有的值:在运用AND 时,只要遇到第一个不满足即布尔值为False的时候,就不再检查其余条件。 优先考虑 and会比or缩短执行时间; 使用生成器:生成器节省内存并提高性能。 (expression for item in iterable) 使用函数式编程 ***** mapreducefilter:这可以提高代码效率。 squared = map(lambda x : x*x, [1, 2, 3]) 这比运行 for 循环更快! map也一个一个地加载每个项目,不像 for 循环将所有项目加载到内存中,这使得map的内存效率更高,尽可能地选择生成器。 👉10个python初学者技巧(6) map filter zip Python 提供了许多预定义的内置函数,终端用户只需调用这些函数就可以使用。 在本教程中,你将学习Python的三个最强大的函数:map()、filter()和reduce()。 Map() Map 对输入列表中的所有项目应用一个函数。 语法:map(function, iterable) 例子:让我们看看下面的例子 items = [1, 2, 3, 4, 5] add = [] for i in items: add.append(i+i) print(add) Output: [2, 4, 6, 8, 10] 我们可以使用map(),就像流水线操作一样 items = [1, 2, 3, 4, 5] add = list(map(lambda x: x+x, items)) print(add) Output: [2, 4, 6, 8, 10] reduce() Reduce()函数将提供的函数应用于'iterables',并返回一个单一的值,正如其名称所暗示的那样。 语法:reduce(function, iterables) 例子:让我们看看下面的例子 add=0 list = [1, 2, 3, 4] for num in list: add = add + num print(add) Output: 10 我们可以使用reduce() from functools import reduce add = reduce((lambda x, y: x + y), [1, 2, 3, 4]) print(add) 输出 10 filter() : filter()函数用于生成一个输出值列表,当该函数被调用时返回真值。 它的语法如下。 语法:filter (function, iterables) 让我们看看下面的例子 number_list =[1,2,3,4,5,6,7,8,9,10] less_than_5 = list(filter(lambda x: x < 5, number_list)) print(less_than_5) Output: [1, 2, 3, 4] Python Pro 使用装饰器:这有助于编写可重用的代码。 etc等装饰器@retry @properties @lru_cache有助于扩展代码的功能。 有像 Flask 这样的框架,它有@route装饰器。 Airflow有@task运算符。 使用正确的内置库:使用正确的内置包可以减少大量样板代码。 此外,由于特定的包在执行特定任务时效率很高。 使用正确的包是有意义的。 例如os vs pathlib& os.system vs subprocess使用正确的命名约定PEP-8是 Python 程序员中流行的指南,它告诉我们如何正确命名变量、类和函数。 坚持风格:坚持你选择的规则让重构变得轻而易举,并且以后更容易理解。 使用 Docstring 和注释:Docstring有助于让用户理解函数/模块打算做什么。 它的输入、输出,以及它们的类型、样本值。 此外,如果我们在前面加上单行注释并表达它的含义,这也会有所帮助。 拆包也是高频用法: first, second, third = [ ‘first’, ‘second’ , ‘third’] 使用日志而不是打印:日志允许我们拥有一致的消息,它还允许我们通过设置正确的日志级别来抑制/过滤消息。 一些不错的补充: 比较运算符链接— if a < b < c if a <b and b < c 使用 Fstring 进行格式化: 使用 F-Strings 更简单的调试方法 名字=“洛基” 年龄= 4 品种=“德国牧羊犬” name = "rocky" age = 4 breed = "german shepherd" print(f"{name=} {age=} {breed=}") 输出: name='rocky' age=4 breed='德国牧羊犬' 这让我们在调试时输入更少的东西。 详细见2个链接👇 Python基础 fstring"{}”内部双引号和单引号的区别 Python官方文档总结输出命令 f'string 用法 通过实现 __repr__ 方法定义对象的写打印表示 使用专门的子类,如: defaultdict、 Counter、 frozenset, from collections import defaultdict, Counter 使用深拷贝from copy import deepcopy 27、使用单独的文件进行配置任何INI、TOML、YAML、ENV 28、 webbrowser模块 Python 以编程方式打开我们的浏览器 导入网络浏览器 webbrowser.open_new("https://bing.com")注意 webbrowser模块是预安装的,所以我们不必使用 Pip 安装它。 运行上面的代码会在我们使用的任何浏览器上打开 bing.com 的新选项卡。 Python内置的中值函数 和其他统计相关的函数,从统计数据导入中位数 from statistics import median lis = [1, 5, 3, 4, 2] print(median(lis)) # 3该statistics模块是我们不必使用 Pip 安装的内置模块,它包含许多其他与统计相关的功能。 我不知道这一点,并且一直在使用np.median Numpy Python 内置分数 Python 标准库的一部分,我们不需要使用 Pip 安装它. from fractions import Fraction x = Fraction(1/2) print(x**2) # 1/4 Python不返回fractions模块中浮点数的内置分数。 locals() 和 globals() 内置 globals()函数允许我们检查我们有哪些全局变量,内置locals()函数允许我们检查我们在函数范围内有哪些局部变量。 x = 1 y = 2 def test(): z = 3 print(globals()) print(locals()) test() 输出: # globals() {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x108a30dc0>, '__spec__': None, '__annotations__': {} , '__builtins__':<模块'builtins'(内置)>, '__file__':'/Users/lzl/Documents/repos/test/aa.py','__cached__':无, 'x':1, 'y': 2, 'test': <0x10896beb0 处的功能测试>} # locals() {'z': 3} 我们甚至可以使用globals()和分配全局/局部变量locals()。 但我建议你不要因为潜在的副作用。 x = 1 def test(): globals()["x"] = 5 test() print(x) # 5

    17 个短代码



    交换变量值
    
    
    将列表中的所有元素组合成字符串
    
    
    查找列表中频率最高的值
    
    
    检查两个字符串是不是由相同字母不同顺序组成
    
    
    反转字符串
    
    
    反转列表
    
    
    转置二维数组
    
    
    链式比较
    
    
    链式函数调用
    
    
    复制列表
    
    
    字典 get 方法
    
    
    通过「键」排序字典元素
    
    
    For Else
    
    
    转换列表为逗号分割符格式
    
    
    合并字典
    
    
    列表中最小和最大值的索引
    
    
    移除列表中的重复元素
    
    
    

    Python 自动化脚本案列



    网络连通性 import platform,os,traceback #判断系统 def get_os(): os = platform.system() if os == "Windows": return "n" else: return "c" #ping判断,成功返回OK,否则Down def ping_ip2(ip_str): try: cmd = ["ping", "-{op}".format(op=get_os()), "1", ip_str] # print(cmd) output = os.popen(" ".join(cmd)).readlines() # print(output) flag = False for line in list(output): if not line: continue if str(line).upper().find("TTL") >= 0: flag = True break if flag: # print("%s OK\n"%(ip_str)) return "OK" else: # print("%s Down\n"%(ip_str)) return "Down" except Exception as e: print(traceback.format_exc()) 端口状态测试 #给定IP,给出端口,默认为22端口,可进行相应的传参。 import sys,os,socket def telnet_port_fun2(ip,port=22): s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) res=s.connect_ex((ip,port)) s.close() if res==0: return 'OPEN' else: return 'CLOSE' 上传下载速率 from speedtest import Speedtest def Testing_Speed(net): download = net.download() upload = net.upload() print(f'下载速度: {download/(1024*1024)} Mbps') print(f'上传速度: {upload/(1024*1024)} Mbps') print("开始网速的测试 ...") #进行调用 net = Speedtest() Testing_Speed(net) paramiko交互 #paramiko是ansible重要模板之一,支持SSH2远程安全连接,支持认证及密钥方式。可以实现远程命 令执行、文件传输、中间SSH代理等功能 import paramiko cmd = "ls" task_info = "ps -aux" # 创建客户端对象 ssh = paramiko.SSHClient() # 接收并保存新的主机名,此外还有RejectPolicy()拒绝未知的主机名 ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) # hostname:目标主机地址,port:端口号,username:登录用户名,password:密码 ssh.connect(hostname="hostname", username="root", password="password", port=22) # 执行命令,timeout为此次会话的超时时间,返回的是(stdin, stdout, stderr)的三元组 stdin, stdout, stderr = ssh.exec_command(cmd, timeout=20) # 需要解码才能把返回的内容转换为正常的字符串形式 print(stdout.read().decode()) linux ssh测试 import subprocess def scan_port(ip, user, passwd): cmd = "id" # try:'{CMD}' COMMAND = "timeout 10 sshpass -p '{PASSWD}' ssh -o StrictHostKeyChecking=no {USER}@{IP} '{CMD}' ".format( PASSWD=passwd, USER=user, IP=ip, CMD=cmd) output = subprocess.Popen(COMMAND, shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE) oerr = output.stderr.readlines() oout = output.stdout.readlines() oinfo = oerr + oout if len(oinfo) != 0: oinfo = oinfo[0].decode() else: oinfo = '未知异常.' if user in oinfo: res = "{USER}登录正常".format(USER=user) elif "reset" in oinfo: res = "没加入白名单" elif "Permission" in oinfo: res = "{USER}密码错误".format(USER=user) elif 'No route to host' in oinfo or ' port 22: Connection refused' in oinfo: res = '22端口不通' else: res = oinfo # print(res,'============',oinfo) return res 内存使用率 import psutil def mem_use(): print('内存信息:') mem=psutil.virtual_memory() #换算为MB memtotal=mem.total/1024/1024 memused=mem.used/1024/1024 mem_percent=str(mem.used/mem.total*100)+'%' print('%.3fMB'%memused) print('%.3fMB'%memtotal) print(mem_percent) CPU使用率 import psutil import os def get_cpu_mem(): pid = os.getpid() p=psutil.Process(pid) cpu_percent = p.cpu_percent() mem_percent = p.memory_percent() print("cpu:{:.2f}%,mem:{:.2f}%".format(cpu_percent,mem_percent)) 获取nginx访问量前十IP import matplotlib.pyplot as plt nginx_file = 'file_path' ip = {} # 筛选nginx日志文件中的IP。 with open(nginx_file) as f: for i in f.readlines(): s = i.strip().split()[0] lengh = len(ip.keys()) if s in ip.keys(): ip[s] = ip[s] + 1 else: ip[s] = 1 ip = sorted(ip.items(), key=lambda e: e[1], reverse=True) # 取前十: newip = ip[0:10:1] tu = dict(newip) x = [] y = [] for k in tu: x.append(k) y.append(tu[k]) plt.title('ip access') plt.xlabel('ip address') plt.ylabel('pv') # X 轴项的翻转角度: plt.xticks(rotation=70) # 显示每个柱状图的值 for a, b in zip(x, y): plt.text(a, b, '%.0f' % b, ha='center', va='bottom', fontsize=6) plt.bar(x, y) plt.legend() plt.show() 操作MySQL 方法1:查询 import pymysql # 创建连接 conn = pymysql.connect(host="127.0.0.1", port=3306, user='user', passwd='passwd', db='db_name', charset='utf8mb4') # 创建游标 cursor = conn.cursor() # 存在sql注入情况(不要用格式化字符串的方式拼接SQL) sql = "insert into USER (NAME) values('%s')" % ('zhangsan',) effect_row = cursor.execute(sql) # 正确方式一 # execute函数接受一个元组/列表作为SQL参数,元素个数只能有1个 sql = "insert into USER (NAME) values(%s)" effect_row1 = cursor.execute(sql, ['value1']) effect_row2 = cursor.execute(sql, ('value2',)) # 正确方式二 sql = "insert into USER (NAME) values(%(name)s)" effect_row1 = cursor.execute(sql, {'name': 'value3'}) # 写入插入多行数据 effect_row2 = cursor.executemany("insert into USER (NAME) values(%s)", [('value4'), ('value5')]) # 提交 conn.commit() # 关闭游标 cursor.close() 方法2:增删查改 # coding=utf-8 import pymysql from loguru import logger from urllib import parse from dbutils.pooled_db import PooledDB from sqlalchemy import create_engine class SqlHelper2(object): global host, user, passwd, port host = 'ip' user = 'root' passwd = 'passwd' port = 3306 def __init__(self,db_name): self.connect(db_name) def connect(self,db_name): self.conn = pymysql.connect(host=host, user=user, passwd=passwd,port=port, db=db_name, charset='utf8mb4') self.conn.ping(reconnect=True) # self.cursor = self.conn.cursor(cursor=pymysql.cursors.DictCursor) self.cursor = self.conn.cursor() def get_list(self,sql): try: self.conn.ping(reconnect=True)#解决超时问题 self.cursor.execute(sql) result = self.cursor.fetchall() self.cursor.close() except Exception as e: self.conn.ping(reconnect=True) self.cursor = self.conn.cursor() self.cursor.execute(sql) result = self.cursor.fetchall() return result def get_one(self, sql): self.cursor.execute(sql) result = self.cursor.fetchone() return result #提交数据 def modify(self,sql,args=[]): try: self.cursor.execute(sql,args) self.conn.commit() qk = "存入MySQL 成功" except Exception as e: # # 如果发生错误则回滚 qk = "存入MySQL 失败:"+str(e) self.conn.rollback() return qk def multiple(self,sql,args=[]): # executemany支持下面的操作,即一次添加多条数据 # self.cursor.executemany('sinsert into class(id,name) values(%s,%s)', [(1,'wang'),(2,'li')]) try: self.cursor.executemany(sql,args) self.conn.commit() qk = "存入MySQL 成功" except Exception as e: qk = "存入MySQL 失败:"+str(e) self.conn.rollback() return qk def create(self,sql,args=[]): self.cursor.execute(sql,args) self.conn.commit() return self.cursor.lastrowid def close(self): self.cursor.close() self.conn.close() xonsh python和shell交互 #Xonsh shell,为喜爱 Python 的 Linux 用户而打造。 #Xonsh 是一个使用 Python 编写的跨平台 shell 语言和命令提示符。 #它结合了 Python 和 Bash shell,因此你可以在这个 shell 中直接运行 Python 命令#(语句)。你甚至可以把 Python 命令和 shell 命令混合起来使用。 #pip install xonsh xonsh #启动 #shell 部分 >>>$GOAL = 'Become the Lord of the Files' >>>print($GOAL) Become the Lord of the Files >>>del $GOAL #python 部分 d = {'xonsh': True} d.get('bash', False) >>>False # cpu内存使用率展示 #pyecharts是百度开源软件echarts的python集成包,可根据需求绘制各类图形。 #折线图 Line from pyecharts.charts import Line import pandas as pd from pyecharts import options as opts import random #模拟数据,生成cpu使用率的折线图 x =list(pd.date_range('20220701','20220830')) y=[random.randint(10,30) for i in range(len(x))] z=[random.randint(5,20) for i in range(len(x))] line = Line(init_opts = opts.InitOpts(width ='800px',height ='600px')) line.add_xaxis(xaxis_data =x) line.add_yaxis(series_name = 'cpu使用率',y_axis = y,is_smooth=True) line.add_yaxis(series_name = '内存使用率',y_axis = z,is_smooth=True) #添加参数,title_opts设置图的标题 line.set_global_opts(title_opts = opts.TitleOpts(title ='CPU和内存使用率折线 图')) line.render()#生成一个render.html浏览器打开 #可根据上述脚本对linux主机采集的数据,存入到MySQL,最后通过python Django、flask、fastapi等web框架进行展示。 执行前需安装相应的依赖包:pip install xxx

    10 个杀手级自动化Python 脚本

    “自动化不是人类工人的敌人,而是盟友。 自动化将工人从苦差事中解放出来,让他有机会做更有创造力和更有价值的工作。 1、文件传输脚本 Python 中的文件传输脚本是一组用 Python 编程语言编写的指令或程序,用于自动执行通过网络或在计算机之间传输文件的过程。 Python 提供了几个可用于创建文件传输脚本的库和模块,例如套接字ftplib、smtplib 和paramiko 等。 下面是 Python 中一个简单的文件传输脚本示例,该脚本使用套接字模块通过网络传输文件: import socket # create socket s = socket.socket() # bind socket to a address and port s.bind(('localhost', 12345)) # put the socket into listening mode s.listen(5) print('Server listening...') # forever loop to keep server running while True: # establish connection with client client, addr = s.accept() print(f'Got connection from {addr}') # receive the file name file_name = client.recv(1024).decode() try: # open the file for reading in binary with open(file_name, 'rb') as file: # read the file in chunks while True: chunk = file.read(1024) if not chunk: break # send the chunk to the client client.sendall(chunk) print(f'File {file_name} sent successfully') except FileNotFoundError: # if file not found, send appropriate message client.sendall(b'File not found') print(f'File {file_name} not found') # close the client connection client.close() 此脚本运行一个服务器,该服务器侦听地址 localhost 和端口 12345 上的传入连接。 当客户端连接时,服务器从客户端接收文件名,然后读取文件的内容并将其以块的形式发送到客户端。 如果未找到该文件,服务器将向客户端发送相应的消息。 如上所述,还有其他库和模块可用于在python中创建文件传输脚本,例如使用ftp协议连接和传输文件的ftplib和用于SFTP/SSH文件传输协议传输的paramiko。 可以定制脚本以匹配特定要求或方案。 2、系统监控脚本 系统监视脚本是一种 Python 脚本用于监视计算机或网络的性能和状态。 该脚本可用于跟踪各种指标,例如 CPU 使用率、内存使用率、磁盘空间、网络流量和系统正常运行时间。 该脚本还可用于监视某些事件或条件,例如错误的发生或特定服务的可用性。 例如: import psutil # Get the current CPU usage cpu_usage = psutil.cpu_percent() # Get the current memory usage memory_usage = psutil.virtual_memory().percent # Get the current disk usage disk_usage = psutil.disk_usage("/").percent # Get the network activity # Get the current input/output data rates for each network interface io_counters = psutil.net_io_counters(pernic=True) for interface, counters in io_counters.items(): print(f"Interface {interface}:") print(f" bytes sent: {counters.bytes_sent}") print(f" bytes received: {counters.bytes_recv}") # Get a list of active connections connections = psutil.net_connections() for connection in connections: print(f"{connection.laddr} <-> {connection.raddr} ({connection.status})") # Print the collected data print(f"CPU usage: {cpu_usage}%") print(f"Memory usage: {memory_usage}%") print(f"Disk usage: {disk_usage}%") 此脚本使用psutil模块中的 cpu_percent: CPU 使用率 virtual_memory:内存使用率 disk_usage: 磁盘使用率。 函数分别检索当前: virtual_memory 函数返回具有各种属性的对象,例如内存总量以及已用内存量和可用内存量。 disk_usage 函数将路径作为参数,并返回具有磁盘上总空间量以及已用空间量和可用空间量等属性的对象。 3、网页抓取脚本最常用 此脚本可用于从网站中提取数据并以结构化格式,如电子表格或数据库存储数据。 这对于收集数据进行分析或跟踪网站上的更改非常有用。 例如: import requests from bs4 import BeautifulSoup # Fetch a web page page = requests.get("http://www.example.com") # Parse the HTML content soup = BeautifulSoup(page.content, "html.parser") # Find all the links on the page links = soup.find_all("a") # Print the links for link in links: print(link.get("href")) 可以看到BeautiulSoup的强大功能。 您可以使用此包找到任何类型的 dom 对象,因为我已经展示了如何找到页面上的所有链接。 您可以修改脚本以抓取其他类型的数据,或导航到站点的不同页面。 还可以使用 find 方法查找特定元素,或使用带有其他参数的 find_all 方法来筛选结果。 4、电子邮件自动化脚本 此脚本可用于根据特定条件自动发送电子邮件。 例如,您可以使用此脚本向团队发送每日报告,或者在重要截止日期临近时向自己发送提醒。 下面是如何使用 Python 发送电子邮件的示例: import smtplib from email.mime.text import MIMEText # Set the SMTP server and login credentials smtp_server = "smtp.gmail.com" smtp_port = 587 username = "your@email.com" password = "yourpassword" # Set the email parameters recipient = "recipient@email.com" subject = "Test email from Python" body = "This is a test email sent from Python." # Create the email message msg = MIMEText(body) msg["Subject"] = subject msg["To"] = recipient msg["From"] = username # Send the email server = smtplib.SMTP(smtp_server, smtp_port) server.starttls() server.login(username, password) server.send_message(msg) server.quit() 此脚本使用 smtplib 和电子邮件模块通过简单邮件传输协议 SMTP 发送电子邮件。 来自smtplib模块的SMTP类用于创建SMTP客户端,starttls和登录方法用于建立安全连接,电子邮件模块中的MIMEText类用于创建多用途Internet邮件扩展MIME格式的电子邮件。 MIMEText 构造函数将电子邮件的正文作为参数,您可以使用 setitem 方法来设置电子邮件的主题、收件人和发件人。 创建电子邮件后,SMTP 对象的send_message方法将用于发送电子邮件。 然后调用 quit 方法以关闭与 SMTP 服务器的连接。 5、密码管理器脚本: 密码管理器脚本是一种用于安全存储和管理密码的 Python 脚本。 该脚本通常包括用于生成随机密码、将哈希密码存储在安全位置如数据库或文件以及在需要时检索密码的函数。 import secrets import string # Generate a random password def generate_password(length=16): characters = string.ascii_letters + string.digits + string.punctuation password = "".join(secrets.choice(characters) for i in range(length)) return password # Store a password in a secure way def store_password(service, username, password): # Use a secure hashing function to store the password hashed_password = hash_function(password) # Store the hashed password in a database or file with open("password_database.txt", "a") as f: f.write(f"{service},{username},{hashed_password}\n") # Retrieve a password def get_password(service, username): # Look up the hashed password in the database or file with open("password_database.txt") as f: for line in f: service_, username_, hashed_password_ = line.strip().split(",") if service == service_ and username == username_: # Use a secure hashing function to compare the stored password with the provided password if hash_function(password) == hashed_password_: return password return None 上述示例脚本中的generate_password 函数使用字母、数字和标点字符的组合生成指定长度的随机密码。 store_password函数将服务,如网站或应用程序、用户名和密码作为输入,并将散列密码存储在安全位置。 get_password函数将服务和用户名作为输入,如果在安全存储位置找到相应的密码,则检索相应的密码。 自动化的 Python 脚本的第 2 部分 欢迎回来! 在上一篇文章中,我们深入研究了 Python 脚本的世界,我们还没有揭开Python脚本的所有奥秘。 在本期中,我们将发现其余五种类型的脚本,这些脚本将让您立即像专业人士一样编码。 6、自动化数据分析: Python的pandas是数据分析和操作的强大工具。 以下脚本演示如何使用它自动执行清理、转换和分析数据集的过程。 import pandas as pd # Reading a CSV file df = pd.read_csv("data.csv") # Cleaning data df.dropna(inplace=True) # Dropping missing values df = df[df["column_name"] != "some_value"] # Removing specific rows # Transforming data df["column_name"] = df["column_name"].str.lower() # Changing string to lowercase df["column_name"] = df["column_name"].astype(int) # Changing column datatype # Analyzing data print(df["column_name"].value_counts()) # Prints the frequency of unique values in the column # Saving the cleaned and transformed data to a new CSV file df.to_csv("cleaned_data.csv", index=False) 上面脚本中的注释对于具有 python 基础知识的人来说非常简单。 该脚本是一个简单的示例,用于演示 pandas 库的强大功能以及如何使用它来自动执行数据清理、转换和分析任务。 但是,脚本是有限的,在实际方案中,数据集可能要大得多,清理、转换和分析操作可能会更复杂。 7、自动化计算机视觉任务: 自动化计算机视觉任务是指使用 Python 及其库自动执行各种图像处理和计算机视觉操作。 Python 中最受欢迎的计算机视觉任务库之一是opencv OpenCV是一个主要针对实时计算机视觉的编程函数库。 它提供了广泛的功能,包括图像和视频 I/O、图像处理、视频分析、对象检测和识别等等。 例如: import cv2 # Load the cascade classifier for face detection face_cascade = cv2.CascadeClassifier("haarcascade_frontalface_default.xml") # Load the image img = cv2.imread("image.jpg") gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Detect faces faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5) # Draw rectangles around the faces for (x, y, w, h) in faces: cv2.rectangle(img, (x, y), (x+w, y+h), (255, 0, 0), 2) # Show the image cv2.imshow("Faces", img) cv2.waitKey(0) cv2.destroyAllWindows() 上面的脚本检测图像中的人脸。 它首先加载一个级联分类器用于人脸检测,这个分类器是一个预先训练的模型,可以识别图像中的人脸。 然后它加载图像并使用 cv2.cvtColor()方法将其转换为灰度。 然后将图像传递给分类器的 detectMultiScale()方法,该方法检测图像中的人脸。 该方法返回检测到的人脸的坐标列表。 然后,该脚本循环遍历坐标列表,并使用 cv2.rectangle()方法在检测到的人脸周围绘制矩形。 最后,使用 cv2.imshow()方法在屏幕上显示图像。 这只是OpenCV可以实现的目标的一个基本示例,还有更多可以自动化的功能,例如对象检测,图像处理和视频分析。 OpenCV 是一个非常强大的库,可用于自动执行各种计算机视觉任务,例如面部识别、对象跟踪和图像稳定。 8、自动化数据加密: 自动化数据加密是指使用 Python 及其库自动加密和解密数据和文件。 Python 中最受欢迎的数据加密库之一是密码学。 “密码学”是一个提供加密配方和原语的库。 它包括高级配方和常见加密算法(如对称密码、消息摘要和密钥派生函数)的低级接口。 以下示例演示了如何使用加密库加密文件: import os from cryptography.fernet import Fernet from cryptography.hazmat.backends import default_backend from cryptography.hazmat.primitives import hashes from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC password = b"super_secret_password" salt = os.urandom(16) kdf = PBKDF2HMAC( algorithm=hashes.SHA256, iterations=100000, length=32, salt=salt, backend=default_backend() ) key = base64.urlsafe_b64encode(kdf.derive(password)) cipher = Fernet(key) # Encrypt the file with open("file.txt", "rb") as f: data = f.read() cipher_text = cipher.encrypt(data) with open("file.txt", "wb") as f: f.write(cipher_text) 它首先使用 PBKDF2HMAC 密钥派生函数生成密钥,这是一个基于密码的密钥派生函数,使用安全哈希算法 SHA-256 和salt值。 salt 值是使用os.urandom()函数生成的,该函数生成加密安全的随机字节。 然后,它创建一个 Fernet 对象,该对象是对称(也称为“密钥”)身份验证加密的实现。 然后,它读取明文文件,并使用 Fernet 对象的encrypt()方法对其进行加密。 最后,它将加密数据写入文件。 请务必注意,用于加密文件的密钥必须保密并安全存储。 如果密钥丢失或泄露,加密的数据将无法读取。 9、自动化测试和调试: 自动化测试和调试是指使用 Python 及其库自动运行测试和调试代码。 在 Python 中,有几个流行的库用于自动化测试和调试,例如 unittest、pytest、nose 和 doctest。 下面是使用unittest 库自动测试在给定字符串中查找最长回文子字符串的 Python 函数的示例: def longest_palindrome(s): n = len(s) ans = "" for i in range(n): for j in range(i+1, n+1): substring = s[i:j] if substring == substring[::-1] and len(substring) > len(ans): ans = substring return ans class TestLongestPalindrome(unittest.TestCase): def test_longest_palindrome(self): self.assertEqual(longest_palindrome("babad"), "bab") self.assertEqual(longest_palindrome("cbbd"), "bb") self.assertEqual(longest_palindrome("a"), "a") self.assertEqual(longest_palindrome(""), "") if __name__ == '__main__': unittest.main() 此脚本使用 unittest 库自动测试在给定字符串中查找最长回文子字符串的 Python 函数。 'longest_palindrome' 函数将字符串作为输入,并通过遍历所有可能的子字符串并检查它是否是回文并且它的长度大于前一个来返回最长的回文子字符串。 该脚本还定义了一个从 unittest 继承的“TestLongestPalindrome”类。 测试用例,并包含多种测试方法。 每个测试方法都使用 assertEqual()方法来检查 longest_palindrome() 函数的输出是否等于预期的输出。 当脚本运行时,将调用unittest.main()函数,该函数运行TestLongestPalindrome类中的所有测试方法。 如果任何测试失败,即longest_palindrome()函数的输出不等于预期输出,则会打印一条错误消息,指示哪个测试失败以及预期和实际输出是什么。 此脚本是如何使用 unittest 库自动测试 Python 函数的示例。 它允许您在将代码部署到生产环境之前轻松测试代码并捕获任何错误或错误。 10、自动化时间序列预测: 自动化时间序列预测是指使用 Python 及其库自动预测时间序列数据的未来值。 在Python中,有几个流行的库可以自动化时间序列预测,例如statsmodels和prophet。 “prophet”是由Facebook开发的开源库,它提供了一种简单快捷的方式来执行时间序列预测。 它基于加法模型,其中非线性趋势与每年、每周和每天的季节性以及假日效应相吻合。 它最适合具有强烈季节性影响的时间序列和多个季节的历史数据。 下面是使用 prophet 库对每日销售数据执行时间序列预测的示例: import pandas as pd from fbprophet import Prophet # Read in data df = pd.read_csv("sales_data.csv") # Create prophet model model = Prophet() # Fit model to data model.fit(df) # Create future dataframe future_data = model.make_future_dataframe(periods=365) # Make predictions forecast = model.predict(future_data) # Print forecast dataframe print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]) 正如Mr.所说:一张图片胜过千言万语 还可以通过在上面添加以下代码行来包含预测销售额的视觉对象: # Import visualization library import matplotlib.pyplot as plt # Plot predicted values model.plot(forecast) plt.show() # Plot predicted values with uncertainty intervals model.plot(forecast) plt.fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'], color='pink') plt.show() # Plot component of the forecast model.plot_components(forecast) plt.show() 第一个可视化效果 model.plot(forecast) 显示预测值和历史数据,它可以让您大致了解模型拟合数据的程度。 第二个可视化效果: plt.fill_between(预测['ds'],预测['yhat_lower'],预测['yhat_upper'],color='pink') 显示具有不确定性区间的预测值,这使您可以查看预测中有多少不确定性。 第三个可视化效果 model.plot_components(forecast) 显示预测的组成部分,例如趋势、季节性和节假日。

    双冒号“::”

    双冒号“::”在 Python 中的起什么什么作用,下面两段代码是什么意思? str1[::-1] list1[3::4] 双冒号是 Python 序列切片功能中的一个特例。 序列的切片使用三个参数 ,如果省略部分参数,则会出现双冒号。 「序列切片的语法格式:」 sequence[start:end:step] 「参数:」 start:切片的起始索引。 如果省略,切片将从序列的开头(即索引 0)开始。 end:切片的结束索引。 如果省略,切片将在序列的最后一个元素处结束。 step:切片中每个元素之间的增量。 如果省略,则默认步长值为 1。

    序列切片示例

    list1=[1,2,3,4,5,6] print(list1[0:5])#输出:[1, 2, 3, 4, 5] print(list1[0:5:1])#输出:[1, 2, 3, 4, 5] print(list1[1:3])#输出:[2, 3] print(list1[0:5:2])#输出:[1, 3, 5]序列切片中的参数 start,end,step 可以省略,如果省略 start,end,形成两个连续的冒号。 list1=[1,2,3,4,5,6] print(list1[::1])#输出: [1, 2, 3, 4, 5, 6] print(list1[::2])#输出:[1, 3, 5] print(list1[::-1])#输出:[6, 5, 4, 3, 2, 1] list1[::1]:切片从第一个元素到最后一个元素,变化量是 1。 list1[::2]:切片从第一个元素到最后一个元素,变化量是 2。 list1[::-1]:切片从第一个元素到最后一个元素,变化量是-1,实现反转序列的功能。 如果只省略 end,也形成两个连续的冒号。 list1=[1,2,3,4,5,6] print(list1[2::1])#输出:[3, 4, 5, 6] print(list1[2::2])#输出:[3, 5] print(list1[1::-1])#输出:[2, 1] list1[2::1]:切片从索引号为 2 的元素到最后一个元素,变化量是 1。 list1[2::2]:切片从索引号为 2 的元素到最后一个元素,变化量是 2。 list1[1::-1]:切片从索引号为 1 的元素反向到最后一个元素,变化量是-1。 如果省略所有三个参数list1[::],直接输出整个序列。 序列的切片操作是 Python 中一项重要的技能,可以操作字符串、列表等序列。 理解这种语法对于任何 Python 程序员来说都是必不可少的,可以使用此技巧有效地编写代码。

    Selenium 选择元素的方法之css表达式

    前面我们介绍了通过CSS selector 语法根据ID、class属性、href属性,以及tag名来选择元素。今天来介绍CSS selector的另一个选择元素的强大之处: 选择语法联合使用 案例1:在网页中有如下一段html代码: <li> <ul class="clearfix"> <li class="tag_title"> 文学 </li> <li> <a href="/tag/小说" class="tag">小说</a> </li> <li> <a href="/tag/随笔" class="tag">随笔</a> </li> <li> <a href="/tag/日本文学" class="tag">日本文学</a> </li> <li class="last"> <a href="/tag/散文" class="tag">散文</a> </li> <li> <a href="/tag/诗歌" class="tag">诗歌</a> </li> <li> <a href="/tag/童话" class="tag">童话</a> </li> <li> <a href="/tag/名著" class="tag">名著</a> </li> <li class="last"> <a href="/tag/港台" class="tag">港台</a> </li> <li class="last"> <a href="/tag/?view=type&amp;icn=index-sorttags-hot#文学" class="tag more_tag">更多»</a> </li> </ul> </li> 如果我们要选择网页 html 中的 <li class="tag_title">文学</li>元素。 CSS selector 表达式可以有这几种写法: 写法1:选择ul节点class属性值为clearfix的子节点中,li节点class属性值为tag_title的元素 element = wdtd.find_element(By.CSS_SELECTOR,'ul.clearfix > li.tag_title') 写法2:选择class属性值为clearfix的子节点中,class属性值为tag_title的元素(不限制节点类型),中间用大于号隔开 element = wdtd.find_element(By.CSS_SELECTOR,'.clearfix > .tag_title') 写法3:选择class属性值为clearfix的子节点中,class属性值为tag_title的元素(不限制节点类型),中间用空格隔开 element = wdtd.find_element(By.CSS_SELECTOR,'.clearfix .tag_title') 示例代码1: """CSS Selector语法选择元素的方法:选择语法联合使用""" from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By # 创建Webdriver对象wdtd,并将webdriver.Chrome()赋值给wdtd wdtd = webdriver.Chrome() # 调用WebDriver对象的get方法,让浏览器打开指定网址(豆瓣网) wdtd.get('https://book.douban.com/') # 选择元素方法:选择语法联合使用 # 写法1: element = wdtd.find_element(By.CSS_SELECTOR,'ul.clearfix > li.tag_title') # 写法2: # element = wdtd.find_element(By.CSS_SELECTOR,'.clearfix > .tag_title') # 写法3: # element = wdtd.find_element(By.CSS_SELECTOR,'.clearfix .tag_title') # 打印出元素对应的html print(element.get_attribute('outerHTML')) # input() 运行结果:分别运行示例中的3种写法的代码,结果都为如下: <li class="tag_title"> 文学 </li> 结果验证:分别验证3种表达式写法的正确性 写法1: 写法2: 写法3: 选择元素的方法:CSS表达式,选择语法组选择前面我们介绍了CSS selector的选择元素的强大功能:选择语法联合使用。接下来我们介绍CSS selector的选择元素的另一强大功能:组选择。 案例2:在网页中有如下一段html代码: <ul class="hot-tags-col5 s" data-dstat-areaid="54" data-dstat-mode="click,expose"> <li> <ul class="clearfix"> <li class="tag_title"> 文学 </li> <li> <a href="/tag/小说" class="tag">小说</a> </li> <li> <a href="/tag/随笔" class="tag">随笔</a> </li> <li> <a href="/tag/日本文学" class="tag">日本文学</a> </li> <li class="last"> <a href="/tag/散文" class="tag">散文</a> </li> <li> <a href="/tag/诗歌" class="tag">诗歌</a> </li> <li> <a href="/tag/童话" class="tag">童话</a> </li> <li> <a href="/tag/名著" class="tag">名著</a> </li> <li class="last"> <a href="/tag/港台" class="tag">港台</a> </li> <li class="last"> <a href="/tag/?view=type&amp;icn=index-sorttags-hot#文学" class="tag more_tag">更多»</a> </li> </ul> </li> <li> <ul class="clearfix"> <li class="tag_title"> 流行 </li> <li> <a href="/tag/漫画" class="tag">漫画</a> </li> <li> <a href="/tag/推理" class="tag">推理</a> </li> <li> <a href="/tag/绘本" class="tag">绘本</a> </li> <li class="last"> <a href="/tag/科幻" class="tag">科幻</a> </li> <li> <a href="/tag/青春" class="tag">青春</a> </li> <li> <a href="/tag/言情" class="tag">言情</a> </li> <li> <a href="/tag/奇幻" class="tag">奇幻</a> </li> <li class="last"> <a href="/tag/武侠" class="tag">武侠</a> </li> <li class="last"> <a href="/tag/?view=type&amp;icn=index-sorttags-hot#流行" class="tag more_tag">更多»</a> </li> </ul> </li> <li> <ul class="clearfix"> <li class="tag_title"> 文化 </li> <li> <a href="/tag/历史" class="tag">历史</a> </li> <li> <a href="/tag/哲学" class="tag">哲学</a> </li> <li> <a href="/tag/传记" class="tag">传记</a> </li> <li class="last"> <a href="/tag/设计" class="tag">设计</a> </li> <li> <a href="/tag/电影" class="tag">电影</a> </li> <li> <a href="/tag/建筑" class="tag">建筑</a> </li> <li> <a href="/tag/回忆录" class="tag">回忆录</a> </li> <li class="last"> <a href="/tag/音乐" class="tag">音乐</a> </li> <li class="last"> <a href="/tag/?view=type&amp;icn=index-sorttags-hot#文化" class="tag more_tag">更多»</a> </li> </ul> </li> <li> <ul class="clearfix"> <li class="tag_title"> 生活 </li> <li> <a href="/tag/旅行" class="tag">旅行</a> </li> <li> <a href="/tag/励志" class="tag">励志</a> </li> <li> <a href="/tag/教育" class="tag">教育</a> </li> <li class="last"> <a href="/tag/职场" class="tag">职场</a> </li> <li> <a href="/tag/美食" class="tag">美食</a> </li> <li> <a href="/tag/灵修" class="tag">灵修</a> </li> <li> <a href="/tag/健康" class="tag">健康</a> </li> <li class="last"> <a href="/tag/家居" class="tag">家居</a> </li> <li class="last"> <a href="/tag/?view=type&amp;icn=index-sorttags-hot#生活" class="tag more_tag">更多»</a> </li> </ul> </li> <li> <ul class="clearfix"> <li class="tag_title"> 经管 </li> <li> <a href="/tag/经济学" class="tag">经济学</a> </li> <li> <a href="/tag/管理" class="tag">管理</a> </li> <li> <a href="/tag/商业" class="tag">商业</a> </li> <li class="last"> <a href="/tag/金融" class="tag">金融</a> </li> <li> <a href="/tag/营销" class="tag">营销</a> </li> <li> <a href="/tag/理财" class="tag">理财</a> </li> <li> <a href="/tag/股票" class="tag">股票</a> </li> <li class="last"> <a href="/tag/企业史" class="tag">企业史</a> </li> <li class="last"> <a href="/tag/?view=type&amp;icn=index-sorttags-hot#经管" class="tag more_tag">更多»</a> </li> </ul> </li> <li> <ul class="clearfix"> <li class="tag_title"> 科技 </li> <li> <a href="/tag/科普" class="tag">科普</a> </li> <li> <a href="/tag/互联网" class="tag">互联网</a> </li> <li> <a href="/tag/编程" class="tag">编程</a> </li> <li class="last"> <a href="/tag/交互设计" class="tag">交互设计</a> </li> <li> <a href="/tag/算法" class="tag">算法</a> </li> <li> <a href="/tag/通信" class="tag">通信</a> </li> <li> <a href="/tag/神经网络" class="tag">神经网络</a> </li> <li class="last"> <a href="/tag/?view=type&amp;icn=index-sorttags-hot#科技" class="tag more_tag">更多»</a> </li> </ul> </li> </ul> 如果我们要同时选择网页html 中所有的 li标签下class名为tag_title和class名为last的元素。这时css选择器使用逗号(,)分隔,称之为[组选择],组选择的CSS selector 表达式的写法是: 写法1: element = wdtd.find_element(By.CSS_SELECTOR,'li.tag_title , li.last') 示例代码1: """CSS Selector语法选择元素的方法:选择语法组选择""" from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By # 创建Webdriver对象wdtd,并将webdriver.Chrome()赋值给wdtd wdtd = webdriver.Chrome() # 调用WebDriver对象的get方法,让浏览器打开指定网址(豆瓣网) wdtd.get('https://book.douban.com/') # 选择元素方法:组选择 # 写法1:同时选择网页html中所有的li标签下class名为tag_title和class名为last的元素,这时css选择器使用逗号,称之为[组选择] elements = wdtd.find_elements(By.CSS_SELECTOR,'li.tag_title,li.last') for element in elements: print(element.text) 运行结果:运行示例中的写法1的代码,结果为如下: 文学 散文 港台 更多» 流行 科幻 武侠 更多» 文化 设计 音乐 更多» 生活 职场 家居 更多» 经管 金融 企业史 更多» 科技 交互设计 更多» 结果验证:用浏览器打开被测系统页面,按键盘上的F12键打开调试控制台窗口,点击 Elements 标签后, 按Ctrl+F键,在搜索栏输入任何 CSS Selector 表达式 ,本例中为:ul.clearfix > li.tag_title 如果能选择到元素,搜索栏的方框结尾处就会显示出类似 1 of 6 这样的内容(如下图序号①)。同时,html中被选择的元素那行就会呈现高亮显示(如下图序号②和③)。 写法1:验证写法1表达式写法的正确性 前面我们介绍了通过CSS selector 语法根据ID、class属性、href属性,以及tag名来选择元素。CSS selector的另两个选择元素的强大之处: 选择语法联合使用和组选择: (1)通过CSS Selector语法:选择语法联合使用时,表达式中间用大于号或者空格隔开,CSS selector 表达式可以有这几种写法: 写法1:选择ul节点class属性值为clearfix的子节点中,li节点class属性值为tag_title的元素 element = wdtd.find_element(By.CSS_SELECTOR,'ul.clearfix > li.tag_title') 写法2:选择class属性值为clearfix的子节点中,class属性值为tag_title的元素(不限制节点类型),中间用大于号隔开 element = wdtd.find_element(By.CSS_SELECTOR,'.clearfix > .tag_title') 写法3:选择class属性值为clearfix的子节点中,class属性值为tag_title的元素(不限制节点类型),中间用空格隔开 element = wdtd.find_element(By.CSS_SELECTOR,'.clearfix .tag_title') (2)通过CSS Selector语法:组选择时,表达式中间用逗号(,)隔开CSS selector 表达式可以有这几种写法: 写法1: element = wdtd.find_element(By.CSS_SELECTOR,'li.tag_title , li.last')

    Selenium 做任何你想做的事情

    “getDevTools() 方法返回新的 Chrome DevTools 对象,允许您使用 send() 方法发送针对 CDP 的内置 Selenium 命令。 这些命令是包装方法,使调用 CDP 函数更加清晰和简便。” ——SHAMA UGALE 首先,什么是 Chrome DevTools?

    Chrome DevTools 简介


    Chrome DevTools 是一组直接内置在基于 Chromium 的浏览器(如 Chrome、Opera 和 Microsoft Edge)中的工具,用于帮助开发人员调试和研究网站。 借助 Chrome DevTools,开发人员可以更深入地访问网站,并能够: 检查 DOM 中的元素 即时编辑元素和 CSS 检查和监控网站的性能 模拟用户的地理位置 模拟更快/更慢的网络速度 执行和调试 JavaScript 查看控制台日志 等等

    Selenium 4 Chrome DevTools API


    Selenium 是支持 web 浏览器自动化的一系列工具和库的综合项目。 Selenium 4 添加了对 Chrome DevTools API 的原生支持。 借助这些新的 API,我们的测试现在可以: 捕获和监控网络流量和性能 模拟地理位置,用于位置感知测试、本地化和国际化测试 更改设备模式并测试应用的响应性 这只是冰山一角! Selenium 4 引入了新的 ChromiumDriver 类,其中包括两个方法用于访问 Chrome DevTools:getDevTools() 和 executeCdpCommand()。 getDevTools() 方法返回新的 DevTools 对象,允许您使用 send() 方法发送针对 CDP 的内置 Selenium 命令。 这些命令是包装方法,使调用 CDP 函数更加清晰和简便。 executeCdpCommand() 方法也允许您执行 CDP 方法,但更加原始。 它不使用包装的 API,而是允许您直接传入 Chrome DevTools 命令和该命令的参数。 如果某个 CDP 命令没有 Selenium 包装 API,或者您希望以与 Selenium API 不同的方式进行调用,则可以使用 executeCdpCommand()。 像 ChromeDriver 和 EdgeDriver 这样的基于 Chromium 的驱动程序现在继承自 ChromiumDriver,因此您也可以从这些驱动程序中访问 Selenium CDP API。 让我们探索如何利用这些新的 Selenium 4 API 来解决各种使用案例。

    模拟设备模式


    我们今天构建的大多数应用都是响应式的,以满足来自各种平台、设备(如手机、平板、可穿戴设备、桌面)和屏幕方向的终端用户的需求。 作为测试人员,我们可能希望将我们的应用程序放置在不同的尺寸中,以触发应用程序的响应性。 我们如何使用 Selenium 的新 CDP 功能来实现这一点呢? 用于修改设备度量的 CDP 命令是 Emulation.setDeviceMetricsOverride,并且此命令需要输入宽度、高度、移动设备标志和设备缩放因子。 这四个键在此场景中是必需的,但还有一些可选的键。 在我们的 Selenium 测试中,我们可以使用 DevTools::send() 方法并使用内置的 setDeviceMetricsOverride() 命令,但是这个 Selenium API 接受 12 个参数 - 除了 4 个必需的参数外,还有 8 个可选的参数。 对于我们不需要发送的这 8 个可选参数中的任何一个,我们可以传递 Optional.empty()。 然而,为了简化这个过程,只传递所需的参数,我将使用下面代码中的原始 executeCdpCommand() 方法。 package com.devtools; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.devtools.DevTools; import java.util.HashMap; import java.util.Map; public class SetDeviceMode { final static String PROJECT_PATH = System.getProperty("user.dir"); public static void main(String[] args){ System.setProperty("webdriver.chrome.driver", PROJECT_PATH + "/src/main/resources/chromedriver"); ChromeDriver driver; driver = new ChromeDriver(); DevTools devTools = driver.getDevTools(); devTools.createSession(); Map deviceMetrics = new HashMap() {{ put("width", 600); put("height", 1000); put("mobile", true); put("deviceScaleFactor", 50); }}; driver.executeCdpCommand("Emulation.setDeviceMetricsOverride", deviceMetrics); driver.get("https://www.google.com"); } } 在第19行,我创建了一个包含此命令所需键的映射。 然后在第26行,我调用 executeCdpCommand() 方法,并传递两个参数:命令名称为 "Emulation.setDeviceMetricsOverride",以及包含参数的设备度量映射。 在第27行,我打开了渲染了我提供的规格的 "Google" 首页,如下图所示。 借助像 Applitools Eyes 这样的解决方案,我们不仅可以使用这些新的 Selenium 命令在不同的视口上快速进行测试,还可以在规模上保持任何不一致性。 Eyes 足够智能,不会对由于不同的浏览器和视口导致的 UI 中微小且难以察觉的变化报告错误的结果。

    模拟地理位置


    在许多情况下,我们需要测试特定的基于位置的功能,例如优惠、基于位置的价格等。 为此,我们可以使用DevTools API来模拟位置。 @Test public void mockLocation(){ devTools.send(Emulation.setGeolocationOverride( Optional.of(48.8584), Optional.of(2.2945), Optional.of(100))); driver.get("https://mycurrentlocation.net/"); try { Thread.sleep(30000); } catch (InterruptedException e) { e.printStackTrace(); } } 模拟网络速度 许多用户通过连接到 Wi-Fi 或蜂窝网络的手持设备访问 Web 应用程序。 遇到信号弱的网络信号,因此互联网连接速度较慢是很常见的。 在互联网连接速度较慢(2G)或间歇性断网的情况下,测试应用程序在这种条件下的行为可能很重要。 伪造网络连接的 CDP 命令是 Network.emulateNetworkConditions。 关于此命令的必需和可选参数的信息可以在文档中找到。 通过访问 Chrome DevTools,就可以模拟这些场景。 让我们看看如何做到这一点。 package com.devtools; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.devtools.DevTools; import org.openqa.selenium.devtools.network.Network; import org.openqa.selenium.devtools.network.model.ConnectionType; import java.util.HashMap; import java.util.Map; import java.util.Optional; public class SetNetwork { final static String PROJECT_PATH = System.getProperty("user.dir"); public static void main(String[] args){ System.setProperty("webdriver.chrome.driver", PROJECT_PATH + "/src/main/resources/chromedriver"); ChromeDriver driver; driver = new ChromeDriver(); DevTools devTools = driver.getDevTools(); devTools.createSession(); devTools.send(Network.enable(Optional.empty(), Optional.empty(), Optional.empty())); devTools.send(Network.emulateNetworkConditions( false, 20, 20, 50, Optional.of(ConnectionType.CELLULAR2G) )); driver.get("https://www.google.com"); } } 在第21行,我们通过调用 getDevTools() 方法获取 DevTools 对象。 然后,我们调用 send() 方法来启用 Network,并再次调用 send() 方法来传递内置命令 Network.emulateNetworkConditions() 和我们希望与此命令一起发送的参数。 最后,我们使用模拟的网络条件打开 Google 首页。

    捕获HTTP请求


    使用 DevTools,我们可以捕获应用程序发起的 HTTP 请求,并访问方法、数据、头信息等等。 让我们看看如何使用示例代码捕获 HTTP 请求、URI 和请求方法。 package com.devtools; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.devtools.DevTools; import org.openqa.selenium.devtools.network.Network; import java.util.Optional; public class CaptureNetworkTraffic { private static ChromeDriver driver; private static DevTools chromeDevTools; final static String PROJECT_PATH = System.getProperty("user.dir"); public static void main(String[] args){ System.setProperty("webdriver.chrome.driver", PROJECT_PATH + "/src/main/resources/chromedriver"); driver = new ChromeDriver(); chromeDevTools = driver.getDevTools(); chromeDevTools.createSession(); chromeDevTools.send(Network.enable(Optional.empty(), Optional.empty(), Optional.empty())); chromeDevTools.addListener(Network.requestWillBeSent(), entry -> { System.out.println("Request URI : " + entry.getRequest().getUrl()+"\n" + " With method : "+entry.getRequest().getMethod() + "\n"); entry.getRequest().getMethod(); }); driver.get("https://www.google.com"); chromeDevTools.send(Network.disable()); } } 开始捕获网络流量的 CDP 命令是 Network.enable。 关于此命令的必需和可选参数的信息可以在文档中找到。 在我们的代码中,第22行使用 DevTools::send() 方法发送 Network.enable CDP 命令以启用网络流量捕获。 第23行添加了一个监听器,用于监听应用程序发送的所有请求。 对于应用程序捕获的每个请求,我们使用 getRequest().getUrl() 提取 URL,并使用 getRequest().getMethod() 提取 HTTP 方法。 第29行,我们打开了 Google 的首页,并在控制台上打印了此页面发出的所有请求的 URI 和 HTTP 方法。 一旦我们完成了请求的捕获,我们可以发送 Network.disable 的 CDP 命令以停止捕获网络流量,如第30行所示。

    拦截HTTP响应


    为了拦截响应,我们将使用Network.responseReceived事件。 当HTTP响应可用时触发此事件,我们可以监听URL、响应头、响应代码等。 要获取响应正文,请使用Network.getResponseBody方法。 @Test public void validateResponse() { final RequestId[] requestIds = new RequestId[1]; devTools.send(Network.enable(Optional.of(100000000), Optional.empty(), Optional.empty())); devTools.addListener(Network.responseReceived(), responseReceived -> { if (responseReceived.getResponse().getUrl().contains("api.zoomcar.com")) { System.out.println("URL: " + responseReceived.getResponse().getUrl()); System.out.println("Status: " + responseReceived.getResponse().getStatus()); System.out.println("Type: " + responseReceived.getType().toJson()); responseReceived.getResponse().getHeaders().toJson().forEach((k, v) -> System.out.println((k + ":" + v))); requestIds[0] = responseReceived.getRequestId(); System.out.println("Response Body: \n" + devTools.send(Network.getResponseBody(requestIds[0])).getBody() + "\n"); } }); driver.get("https://www.zoomcar.com/bangalore"); driver.findElement(By.className("search")).click(); } 访问控制台日志 我们都依赖日志来进行调试和分析故障。 在测试和处理具有特定数据或特定条件的应用程序时,日志可以帮助我们调试和捕获错误消息,提供更多在 Chrome DevTools 的控制台选项卡中发布的见解。 我们可以通过调用 CDP 日志命令来通过我们的 Selenium 脚本捕获控制台日志,如下所示。 package com.devtools; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.devtools.DevTools; import org.openqa.selenium.devtools.log.Log; public class CaptureConsoleLogs { private static ChromeDriver driver; private static DevTools chromeDevTools; final static String PROJECT_PATH = System.getProperty("user.dir"); public static void main(String[] args){ System.setProperty("webdriver.chrome.driver", PROJECT_PATH + "/src/main/resources/chromedriver"); driver = new ChromeDriver(); chromeDevTools = driver.getDevTools(); chromeDevTools.createSession(); chromeDevTools.send(Log.enable()); chromeDevTools.addListener(Log.entryAdded(), logEntry -> { System.out.println("log: "+logEntry.getText()); System.out.println("level: "+logEntry.getLevel()); }); driver.get("https://testersplayground.herokuapp.com/console-5d63b2b2-3822-4a01-8197-acd8aa7e1343.php"); } } 在我们的代码中,第19行使用 DevTools::send() 来启用控制台日志捕获。 然后,我们添加一个监听器来捕获应用程序记录的所有控制台日志。 对于应用程序捕获的每个日志,我们使用 getText() 方法提取日志文本,并使用 getLevel() 方法提取日志级别。 最后,打开应用程序并捕获应用程序发布的控制台错误日志。

    捕获性能指标


    在当今快节奏的世界中,我们以如此快的速度迭代构建软件,我们也应该迭代性地检测性能瓶颈。 性能较差的网站和加载较慢的页面会让客户感到不满。 我们能够在每次构建时验证这些指标吗?是的,我们可以! 捕获性能指标的 CDP 命令是 Performance.enable。 关于这个命令的信息可以在文档中找到。 让我们看看如何在 Selenium 4 和 Chrome DevTools API 中完成这个过程。 package com.devtools; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.devtools.DevTools; import org.openqa.selenium.devtools.performance.Performance; import org.openqa.selenium.devtools.performance.model.Metric; import java.util.Arrays; import java.util.List; import java.util.stream.Collectors; public class GetMetrics { final static String PROJECT_PATH = System.getProperty("user.dir"); public static void main(String[] args){ System.setProperty("webdriver.chrome.driver", PROJECT_PATH + "/src/main/resources/chromedriver"); ChromeDriver driver = new ChromeDriver(); DevTools devTools = driver.getDevTools(); devTools.createSession(); devTools.send(Performance.enable()); driver.get("https://www.google.org"); List<Metric> metrics = devTools.send(Performance.getMetrics()); List<String> metricNames = metrics.stream() .map(o -> o.getName()) .collect(Collectors.toList()); devTools.send(Performance.disable()); List<String> metricsToCheck = Arrays.asList( "Timestamp", "Documents", "Frames", "JSEventListeners", "LayoutObjects", "MediaKeySessions", "Nodes", "Resources", "DomContentLoaded", "NavigationStart"); metricsToCheck.forEach( metric -> System.out.println(metric + " is : " + metrics.get(metricNames.indexOf(metric)).getValue())); } } 首先,我们通过调用 DevTools 的 createSession() 方法创建一个会话,如第19行所示。 接下来,我们通过将 Performance.enable() 命令发送给 send() 来启用 DevTools 来捕获性能指标,如第20行所示。 一旦启用了性能捕获,我们可以打开应用程序,然后将 Performance.getMetrics() 命令发送给 send()。 这将返回一个 Metric 对象的列表,我们可以通过流式处理来获取捕获的所有指标的名称,如第25行所示。 然后,我们通过将 Performance.disable() 命令发送给 send() 来禁用性能捕获,如第29行所示。 为了查看我们感兴趣的指标,我们定义了一个名为 metricsToCheck 的列表,然后通过循环遍历该列表来打印指标的值。

    基本身份验证


    在 Selenium 中,无法与浏览器弹出窗口进行交互,因为它只能与 DOM 元素进行交互。 这对于身份验证对话框等弹出窗口构成了挑战。 我们可以通过使用 CDP API 直接与 DevTools 处理身份验证来绕过此问题。 设置请求的附加标头的 CDP 命令是 Network.setExtraHTTPHeaders。 以下是在 Selenium 4 中调用此命令的方法。 package com.devtools; import org.apache.commons.codec.binary.Base64; import org.openqa.selenium.By; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.devtools.DevTools; import org.openqa.selenium.devtools.network.Network; import org.openqa.selenium.devtools.network.model.Headers; import java.util.HashMap; import java.util.Map; import java.util.Optional; public class SetAuthHeader { private static final String USERNAME = "guest"; private static final String PASSWORD = "guest"; final static String PROJECT_PATH = System.getProperty("user.dir"); public static void main(String[] args){ System.setProperty("webdriver.chrome.driver", PROJECT_PATH + "/src/main/resources/chromedriver"); ChromeDriver driver = new ChromeDriver(); //Create DevTools session and enable Network DevTools chromeDevTools = driver.getDevTools(); chromeDevTools.createSession(); chromeDevTools.send(Network.enable(Optional.empty(), Optional.empty(), Optional.empty())); //Open website driver.get("https://jigsaw.w3.org/HTTP/"); //Send authorization header Map headers = new HashMap<>(); String basicAuth ="Basic " + new String(new Base64().encode(String.format("%s:%s", USERNAME, PASSWORD).getBytes())); headers.put("Authorization", basicAuth); chromeDevTools.send(Network.setExtraHTTPHeaders(new Headers(headers))); //Click authentication test - this normally invokes a browser popup if unauthenticated driver.findElement(By.linkText("Basic Authentication test")).click(); String loginSuccessMsg = driver.findElement(By.tagName("html")).getText(); if(loginSuccessMsg.contains("Your browser made it!")){ System.out.println("Login successful"); }else{ System.out.println("Login failed"); } driver.quit(); } } 我们首先使用 DevTools 对象创建一个会话,并启用 Network。 这在第25-26行中展示。 接下来,我们打开我们的网站,然后创建用于发送的身份验证标头。 在第35行,我们将 setExtraHTTPHeaders 命令发送到 send(),同时发送标头的数据。 这部分将对我们进行身份验证并允许我们绕过浏览器弹出窗口。 为了测试这个功能,我们点击了基本身份验证测试链接。 如果您手动尝试这个操作,您会看到浏览器弹出窗口要求您进行登录。 但由于我们发送了身份验证标头,所以我们的脚本中不会出现这个弹出窗口。 相反,我们会收到消息“您的浏览器登录成功!”。

    总结


    通过添加 CDP API,Selenium 已经变得更加强大。 现在,我们可以增强我们的测试,捕获 HTTP 网络流量,收集性能指标,处理身份验证,并模拟地理位置、时区和设备模式。 以及在 Chrome DevTools 中可能出现的任何其他功能! 参考: Selenium官方网站:https://www.selenium.dev/ Selenium文档:https://www.selenium.dev/documentation/en/ Selenium教程:https://www.selenium.dev/documentation/en/getting_started/ Selenium API文档:https://www.selenium.dev/selenium/docs/api/py/index.html

    使用 Python 进行 Windows GUI 自动化

    我们将探讨如何使用 Python 进行 Windows GUI 自动化。GUI 自动化可以帮助我们自动执行许多与操作系统交互的任务,比如移动鼠标、点击按钮、输入文本、移动窗口等。Python 提供了两个强大的库:pyautogui 和 pywinauto,使得 GUI 自动化变得简单。接下来我们详细介绍。

    pyautogui

    pyautogui 是一个纯 Python 的 GUI 自动化库,它可以模拟键盘输入、鼠标点击和移动、在屏幕上查找图像等操作。它对 Windows、macOS、和 Linux 都有良好的支持,可以帮助我们编写跨平台的自动化脚本。

    pyautogui 的使用场景

    pyautogui 的使用场景非常广泛。以下是一些常见的例子: ** 测试 **:自动化脚本可以帮助我们自动执行一些复杂的测试用例,比如 UI 测试、功能测试等。 ** 数据录入 **:如果我们需要在多个表单或应用程序中输入相同的数据,自动化脚本可以帮助我们节省大量的时间和精力。 ** 批量操作 **:如果我们需要对大量的文件或数据进行相同的操作,自动化脚本也可以派上用场。

    如何安装 pyautogui

    在开始使用 pyautogui 之前,我们需要先在我们的 Python 环境中安装它。在命令行中输入以下命令即可: pip install pyautogui

    pyautogui 打开记事本,输入文本保存

    接下来,我们通过一个简单的例子来展示如何使用 pyautogui。在这个例子中,我们将使用 pyautogui 来自动打开一个记事本,输入一些文字,然后保存并关闭它。 首先,我们导入 pyautogui 库,并设置失败安全特性,当我们将鼠标移动到屏幕的左上角时,自动化会立即停止: import pyautogui pyautogui.FAILSAFE = True 然后,我们使用 pyautoguihotkey 函数来模拟按下 Win+R 组合键,打开运行对话框: pyautogui.hotkey('win', 'r') 接着,我们使用 typewrite 函数来输入 "notepad",并按下回车键: pyautogui.typewrite('notepad', interval=0.25) pyautogui.press('enter') 然后,我们等待一下,让记事本完全打开,然后再输入一些文字: import time time.sleep(2) # wait for Notepad to open pyautogui.typewrite('Hello, world!', interval=0.25) typewrite 函数可以模拟键盘输入,interval 参数可以设置每个字符之间的间隔,以模拟人类的打字速度。 接下来,我们用 hotkey 函数来模拟按下 Ctrl+S 组合键,保存这个文件: pyautogui.hotkey('ctrl', 's') # press the Save hotkey combination time.sleep(1) # wait for the Save dialog to appear 然后我们输入文件名,并按下回车键保存文件: pyautogui.typewrite('hello_world.txt', interval=0.25) pyautogui.press('enter') # press the Enter key 最后,我们用 hotkey 函数来模拟按下 Alt+F4 组合键,关闭记事本: pyautogui.hotkey('alt', 'f4') # close Notepad 以上就是用 pyautogui 编写的一个简单的自动化脚本。通过这个脚本,我们可以看到,pyautogui 提供了一套非常直观和易用的接口,让我们可以轻松地编写出复杂的自动化脚本。

    pywinauto

    pywinauto 的主要用途是自动化 Windows GUI 应用程序的测试和自动化。

    pywinauto 的使用场景

    回归测试:定期运行相同的测试,确保软件在进行更改或更新后仍然可以正常工作。 质量保证:确保软件的新版本或功能与预期的用户体验一致。 持续集成 / 持续部署 (CI/CD) 流程:在自动化的构建和部署过程中,进行软件测试。 任务自动化:自动执行一些重复性的 GUI 操作,如文件管理,软件安装等。

    用 pywinauto 来自动化 Windows 计算器

    下面是一个简单的 pywinauto 教程,我们将演示如何用 pywinauto 来自动化 Windows 计算器的操作。 首先,你需要确保你的环境已经安装了 Python 和 pywinauto。你可以使用 pip 来安装 pywinauto: pip install pywinauto 然后,我们可以编写一个简单的脚本来启动计算器应用并执行一些操作: from pywinauto.application import Application # 启动 Windows 计算器 app = Application().start("calc.exe") # 选择计算器窗口 dlg = app.window(title=' 计算器 ') # 在计算器中输入 2+2 dlg.type_keys('2+2=') # 打印结果 print(" 结果是:", dlg.Static2.window_text()) 这段代码首先启动了 Windows 计算器,然后在计算器中执行了 2+2 的操作,并打印出结果。 ** 请注意:这个示例假设你的计算器应用具有类似于 Windows 10 计算器的布局。不同的 Windows 版本可能需要适当调整代码。**

    用 pywinauto 来自动化 Windows 记事本


    导入模块

    在 Python 脚本中,我们需要导入 pywinauto 库。同时,我们还会导入 time 库,因为在执行某些操作时,我们可能需要暂停一下。 启动应用程序 使用 pywinauto 的 Application 对象,我们可以启动和控制应用程序。例如,如果我们要打开记事本,我们可以这样做: app = Application().start(操作窗口 在打开应用程序后,我们通常需要与其窗口进行交互。我们可以使用 app 对象的 window_ 方法来获取窗口。然后,我们可以调用窗口的方法来执行各种操作,如点击按钮或输入文本。 例如,我们可以在记事本中输入一些文本: app.Notepad.Edit.type_keys("Hello, World!", with_spaces = True) type_keys 方法会模拟键盘按键。with_spaces = True 参数表示我们希望在每次按键之间添加短暂的延迟,以模拟人类的打字速度。

    保存和关闭

    最后,我们可以模拟点击菜单选项来保存我们的文件,然后关闭记事本: app.Notepad.menu_select("File -> Save As") app.SaveAs.Edit.set_edit_text("example.txt") app.SaveAs.Save.click() time.sleep(1) app.Notepad.menu_select("File -> Exit") 在这个例子中,menu_select 方法用于模拟点击菜单选项,set_edit_text 方法用于在文本框中输入文本,click 方法用于点击按钮。 ** 请注意:这个示例假设你的记事本的菜单是英文,如果是中文,则需要调整代码为中文。** 以上就是一个基本的例子,展示了如何使用 Python 和 pywinauto 进行 Windows GUI 自动化。当然,pywinauto 还有更多的功能等待您去探索,包括处理复杂的窗口结构、模拟鼠标操作等。

    最后的话

    pywinauto 和 pyautogui 都是强大的 GUI 自动化工具,可以帮助你自动化 Windows 应用程序的许多任务,你可以选择合适的工具进行自动化。希望这篇文章和教程能帮你提高工作效率,有问题也可以添加微信[somenzz-enjoy ]交流学习。

    Python 双冒号“::”是什么运算符

    双冒号“::”在 Python 中是 sequence[start:end:step] str1[::-1] list1[3::4] 双冒号是 Python 序列切片功能中的一个特例。 序列的切片使用三个参数 ,如果省略部分参数,则会出现双冒号。 序列切片的语法格式: sequence[start:end:step] 参数: start:切片的起始索引。 如果省略,切片将从序列的开头(即索引 0)开始。 end:切片的结束索引。 如果省略,切片将在序列的最后一个元素处结束。 step:切片中每个元素之间的增量。 如果省略,则默认步长值为 1。 序列切片示例 list1 = [1, 2, 3, 4, 5, 6] print(list1[0:5]) # 输出:[1, 2, 3, 4, 5] print(list1[0:5:1]) # 输出:[1, 2, 3, 4, 5] print(list1[1:3]) # 输出:[2, 3] print(list1[0:5:2]) # 输出:[1, 3, 5] 序列切片中的参数 start,end,step 可以省略,如果省略 start,end,形成两个连续的冒号。 list1 = [1, 2, 3, 4, 5, 6] print(list1[::1]) # 输出:[1, 2, 3, 4, 5, 6] print(list1[::2]) # 输出:[1, 3, 5] print(list1[::-1]) # 输出:[6, 5, 4, 3, 2, 1] list1[::1] :切片从第一个元素到最后一个元素,变化量是 1。 list1[::2] :切片从第一个元素到最后一个元素,变化量是 2。 list1[::-1] :切片从第一个元素到最后一个元素,变化量是-1,实现反转序列的功能。 如果只省略 end,也形成两个连续的冒号。 list1 = [1, 2, 3, 4, 5, 6] print(list1[2::1]) # 输出:[3, 4, 5, 6] print(list1[2::2]) # 输出:[3, 5] print(list1[1::-1]) # 输出:[2, 1] list1[2::1]:切片从索引号为 2 的元素到最后一个元素,变化量是 1。 list1[2::2]:切片从索引号为 2 的元素到最后一个元素,变化量是 2。 list1[1::-1]:切片从索引号为 1 的元素反向到最后一个元素,变化量是-1。 如果省略所有三个参数list1[::] ,直接输出整个序列。 序列的切片操作是 Python 中一项重要的技能,可以操作字符串、列表等序列。 理解这种语法对于任何 Python 程序员来说都是必不可少的,可以使用此技巧有效地编写代码。

    Print to File in Python

    Open file in append mode using open() built-in function. Call print statement. f = open('my_log.txt', 'a') print("Hello World", file=f) The syntax to print a list to the file is print(['apple', 'banana'], file=f) Similarly we can print other data types to a file as well. text = "Hello World! Welcome to new world." print(text, file=f) # Print tuple to file print(('apple', 25), file=f) # Print set to file print({'a', 'e', 'i', 'o', 'u'}, file=f) # Print dictionary to file print({'mac' : 25, 'sony' : 22}, file=f) Search and Replace in Excel File using Python

    get url and save file

    Web Scraping with Selenium and Python
    import requests url = "https://www.geeksforgeeks.org/sql-using-python/" #just a random link of a dummy file r = requests.get(url) #retrieving data from the URL using get method with open("test.html", 'wb') as f: #giving a name and saving it in any required format #opening the file in write mode f.write(r.content) #writes the URL contents from the server print("test.html file created: ")

    How to set up a local HTTP server

    Run the following commands to start a local HTTP server: # If python -V returned 2.X.X python -m SimpleHTTPServer # If python -V returned 3.X.X python3 -m http.server # Note that on Windows you may need to run python -m http.server instead of python3 -m http.server You'll notice that both commands look very different – one calls SimpleHTTPServer and the other http.server. This is just because the SimpleHTTPServer module was rolled into Python's http.server in Python 3. They both work the same way. Now when you go to http://localhost:8000/ you should see a list of all the files in your directory. Then you can just click on the HTML file you want to view. Just keep in mind that SimpleHTTPServer and http.server are only for testing things locally. They only do very basic security checks and shouldn't be used in production.

    How to send files locally

    To set up a sort of quick and dirty NAS (Network Attached Storage) system: Make sure both computers are connected through same network via LAN or WiFi Open your command prompt or terminal and run python -V to make sure Python is installed Go to the directory whose file you want to share by using cd (change directory) command. Go to the directory with the file you want to share using cd on *nix or MacOS systems or CD for Windows Start your HTTP server with either python -m SimpleHTTPServer or python3 -m http.server Open new terminal and type ifconfig on *nix or MacOS or ipconfig on Windows to find your IP address Now on the second computer or device: Open browser and type in the IP address of the first machine, along with port 8000: http://[ip address]:8000 A page will open showing all the files in the directory being shared from the first computer. If the page is taking too long to load, you may need to adjust the firewall settings on the first computer.

    Python Sends And Receives Message

    To Get Url In Python
    Fetch Internet Resources Using The urllib
    Sends And Receives Message from Client
    Building a Python Agent with CLI and Web API

    Command Line Arguments in Python

    Python provides various ways of dealing with types of arguments. The three most common are: import sys # total arguments n = len(sys.argv) print("Total arguments passed:", n) # Arguments passed print("\nName of Python script:", sys.argv[0]) print("\nArguments passed:", end = " ") for i in range(1, n): import requests from bs4 import BeautifulSoup URL = "http://www.guancha.cn/" r = requests.get(URL) soup = BeautifulSoup(r.content, 'html5lib') filename = 'temp.html' f = open(filename, "a", encoding = "utf-8") f.write(str(soup.prettify())) # write() argument must be str f.close() import sys # total arguments n = len(sys.argv) print("\n\n\nTotal arguments passed:", n) # Arguments passed print("Name of Python script:", sys.argv[0]) print("\nArguments passed:", end = "\n") for i in range(0, n): print("Arguments ", i , " ", sys.argv[i]) # Addition of numbers Sum = 0 # Using argparse module for i in range(1, n): Sum += int(sys.argv[i]) print("\n\nResult:", Sum)

    python read xls

    import xlrd wb = xlrd.open_workbook('2023年岀库表.xls') sheet = wb['岀库登记'] #'岀库' for row in range(2, 10): cell = sheet.cell(row, 3) print("row: ", row, " - ",cell.value) print("一共有 {0} 页".format(wb.nsheets)) print("Worksheet 名称(s): {0}".format(wb.sheet_names())) sh = wb.sheet_by_index(0) print("工资表名称 {0}, 行数{1}, 列数{2}".format(sh.name, sh.nrows, sh.ncols)) print("Cell D10 is {0}".format(sh.cell_value(rowx=9, colx=3))) print("行数 sh.nrows: ",sh.nrows) #for rx in range(sh.nrows): # print(sh.row(rx))

    openpyxl, Python to read xlsx/xlsm files

    from openpyxl import Workbook wb = Workbook() # grab the active worksheet ws = wb.active # Data can be assigned directly to cells ws['A1'] = 42 # Rows can also be appended ws.append([1, 2, 3]) # Python types will automatically be converted import datetime ws['A2'] = datetime.datetime.now() # Save the file wb.save("sample.xlsx")

    Read XLSM File

    # Import the Pandas libraray as pd import pandas as pd # Read xlsm file df = pd.read_excel("score.xlsm",sheet_name='Sheet1',index_col=0) # Display the Data print(df)

    Read data from the Excel file

    import pandas as pd excel_file = 'movies.xls' movies = pd.read_excel(excel_file) movies.head() movies_sheet1 = pd.read_excel(excel_file, sheetname=0, index_col=0) movies_sheet1.head() movies_sheet2 = pd.read_excel(excel_file, sheetname=1, index_col=0) movies_sheet2.head() movies_sheet3 = pd.read_excel(excel_file, sheetname=2, index_col=0) movies_sheet3.head() movies = pd.concat([movies_sheet1, movies_sheet2, movies_sheet3]) movies.shape xlsx = pd.ExcelFile(excel_file) movies_sheets = [] for sheet in xlsx.sheet_names: movies_sheets.append(xlsx.parse(sheet)) movies = pd.concat(movies_sheets)

    Automate Excel in Python

    import openpyxl as xl from openpyxl.chart import BarChart, Reference wb = xl.load_workbook('python-spreadsheet.xlsx') sheet = wb['Sheet1'] for row in range(2, sheet.max_row + 1): cell = sheet.cell(row, 3) corrected_price = float(cell.value.replace('$','')) * 0.9 corrected_price_cell = sheet.cell(row, 4) corrected_price_cell.value = corrected_price values = Reference(sheet, min_row=2, max_row=sheet.max_row, min_col=4, max_col=4) chart = BarChart() chart.add_data(values) sheet.add_chart(chart, 'e2') wb.save('python-spreadsheet2.xls') # Make it work for several spreadsheets, move the code inside a function def process_workbook(filename): wb = xl.load_workbook(filename) sheet = wb['Sheet1'] for row in range(2, sheet.max_row + 1): cell = sheet.cell(row, 3) corrected_price = float(cell.value.replace('$', '')) * 0.9 corrected_price_cell = sheet.cell(row, 4) corrected_price_cell.value = corrected_price values = Reference(sheet, min_row=2, max_row=sheet.max_row, min_col=4, max_col=4) chart = BarChart() chart.add_data(values) sheet.add_chart(chart, 'e2') wb.save(filename)

    OpenPyXL

    https://openpyxl.readthedocs.io/en/stable/ OpenPyXL is not your only choice. There are several other packages that support Microsoft Excel: xlrd – For reading older Excel (.xls) documents xlwt – For writing older Excel (.xls) documents xlwings – Works with new Excel formats and has macro capabilities A couple years ago, the first two used to be the most popular libraries to use with Excel documents. However, the author of those packages has stopped supporting them. The xlwings package has lots of promise, but does not work on all platforms and requires that Microsoft Excel is installed. You will be using OpenPyXL in this article because it is actively developed and supported. OpenPyXL doesn’t require Microsoft Excel to be installed, and it works on all platforms. You can install OpenPyXL using pip: $ python -m pip install openpyxl After the installation has completed, let’s find out how to use OpenPyXL to read an Excel spreadsheet!

    Getting Sheets from a Workbook


    The first step is to find an Excel file to use with OpenPyXL. There is a books.xlsx file that is provided for you in this book’s Github repository. You can download it by going to this URL: https://github.com/driscollis/python101code/tree/master/chapter38_excel Feel free to use your own file, although the output from your own file won’t match the sample output in this book. The next step is to write some code to open the spreadsheet. To do that, create a new file named open_workbook.py and add this code to it: # open_workbook.py from openpyxl import def open_workbook() load_workbook() print('Worksheet names: {workbook.sheetnames}') print() print('The title of the Worksheet is: {sheet.title}') if __name__ '__main__' open_workbook( 'books.xlsx') # open_workbook.py from openpyxl import load_workbook def open_workbook(path): workbook = load_workbook(filename=path) print(f'Worksheet names: {workbook.sheetnames}') sheet = workbook.active print(sheet) print(f'The title of the Worksheet is: {sheet.title}') if __name__ == '__main__': open_workbook('books.xlsx') # open_workbook.py from openpyxl import load_workbook def open_workbook(path): workbook = load_workbook(filename=path) print(f'Worksheet names: {workbook.sheetnames}') sheet = workbook.active print(sheet) print(f'The title of the Worksheet is: {sheet.title}') if __name__ == '__main__': open_workbook('books.xlsx') In this example, you import load_workbook() from openpyxl and then create open_workbook() which takes in the path to your Excel spreadsheet. Next, you use load_workbook() to create an openpyxl.workbook.workbook.Workbook object. This object allows you to access the sheets and cells in your spreadsheet. And yes, it really does have the double workbook in its name. That’s not a typo! The rest of the open_workbook() function demonstrates how to print out all the currently defined sheets in your spreadsheet, get the currently active sheet and print out the title of that sheet. When you run this code, you will see the following output: [ 'Sheet 1 - Books' ] < "Sheet 1 - Books" > of 1 Worksheet names: ['Sheet 1 - Books'] <Worksheet "Sheet 1 - Books"> The title of the Worksheet is: Sheet 1 - Books Worksheet names: ['Sheet 1 - Books'] <Worksheet "Sheet 1 - Books"> The title of the Worksheet is: Sheet 1 - Books Now that you know how to access the sheets in the spreadsheet, you are ready to move on to accessing cell data!

    Reading Cell Data


    When you are working with Microsoft Excel, the data is stored in cells. You need a way to access those cells from Python to be able to extract that data. OpenPyXL makes this process straight-forward. Create a new file named workbook_cells.py and add this code to it: # workbook_cells.py from openpyxl import def get_cell_info () load_workbook() print() print('The title of the Worksheet is: {sheet.title}') print('The value of {sheet["A2"].value=}') print('The value of {sheet["A3"].value=}') [ 'B3' ] print('{cell.value=}') if __name__ '__main__' get_cell_info ( 'books.xlsx') # workbook_cells.py from openpyxl import load_workbook def get_cell_info(path): workbook = load_workbook(filename=path) sheet = workbook.active print(sheet) print(f'The title of the Worksheet is: {sheet.title}') print(f'The value of {sheet["A2"].value=}') print(f'The value of {sheet["A3"].value=}') cell = sheet['B3'] print(f'{cell.value=}') if __name__ == '__main__': get_cell_info('books.xlsx') # workbook_cells.py from openpyxl import load_workbook def get_cell_info(path): workbook = load_workbook(filename=path) sheet = workbook.active print(sheet) print(f'The title of the Worksheet is: {sheet.title}') print(f'The value of {sheet["A2"].value=}') print(f'The value of {sheet["A3"].value=}') cell = sheet['B3'] print(f'{cell.value=}') if __name__ == '__main__': get_cell_info('books.xlsx') This code will load up the Excel file in an OpenPyXL workbook. You will grab the active sheet and then print out its title and a couple of different cell values. You can access a cell by using the sheet object followed by square brackets with the column name and row number inside of it. For example, sheet["A2"] will get you the cell at column “A”, row 2. To get the value of that cell, you use the value attribute. Note: This code is using a new feature that was added to f-strings in Python 3.8. If you run this with an earlier version, you will receive an error. When you run this code, you will get this output: < "Sheet 1 - Books" > of 1 of [ "A2" ] value 'Title' of [ "A3" ] value 'Python 101' value 'Mike Driscoll' <Worksheet "Sheet 1 - Books"> The title of the Worksheet is: Sheet 1 - Books The value of sheet["A2"].value='Title' The value of sheet["A3"].value='Python 101' cell.value='Mike Driscoll' <Worksheet "Sheet 1 - Books"> The title of the Worksheet is: Sheet 1 - Books The value of sheet["A2"].value='Title' The value of sheet["A3"].value='Python 101' cell.value='Mike Driscoll' You can get additional information about a cell using some of its other attributes. Add the following function to your file and update the conditional statement at the end to run it: def get_info_by_coord () load_workbook() [ 'A2' ] print('Row {cell.row}, Col {cell.column} = {cell.value}') print('{cell.value=} is at {cell.coordinate=}') if __name__ '__main__' get_info_by_coord ( 'books.xlsx') def get_info_by_coord(path): workbook = load_workbook(filename=path) sheet = workbook.active cell = sheet['A2'] print(f'Row {cell.row}, Col {cell.column} = {cell.value}') print(f'{cell.value=} is at {cell.coordinate=}') if __name__ == '__main__': get_info_by_coord('books.xlsx') def get_info_by_coord(path): workbook = load_workbook(filename=path) sheet = workbook.active cell = sheet['A2'] print(f'Row {cell.row}, Col {cell.column} = {cell.value}') print(f'{cell.value=} is at {cell.coordinate=}') if __name__ == '__main__': get_info_by_coord('books.xlsx') In this example, you use the row and column attributes of the cell object to get the row and column information. Note that column “A” maps to “1”, “B” to “2”, etcetera. If you were to iterate over the Excel document, you could use the coordinate attribute to get the cell name. When you run this code, the output will look like this: 2 1 value 'Title' coordinate 'A2' Row 2, Col 1 = Title cell.value='Title' is at cell.coordinate='A2' Row 2, Col 1 = Title cell.value='Title' is at cell.coordinate='A2' Speaking of iterating, let’s find out how to do that next!

    Iterating Over Rows and Columns


    Sometimes you will need to iterate over the entire Excel spreadsheet or portions of the spreadsheet. OpenPyXL allows you to do that in a few different ways. Create a new file named iterating_over_cells.py and add the following code to it: # iterating_over_cells.py from openpyxl import def iterating_range () load_workbook() for in [ 'A' ] print() if __name__ '__main__' iterating_range ( 'books.xlsx') # iterating_over_cells.py from openpyxl import load_workbook def iterating_range(path): workbook = load_workbook(filename=path) sheet = workbook.active for cell in sheet['A']: print(cell) if __name__ == '__main__': iterating_range('books.xlsx') # iterating_over_cells.py from openpyxl import load_workbook def iterating_range(path): workbook = load_workbook(filename=path) sheet = workbook.active for cell in sheet['A']: print(cell) if __name__ == '__main__': iterating_range('books.xlsx') Here you load up the spreadsheet and then loop over all the cells in column “A”. For each cell, you print out the cell object. You could use some of the cell attributes you learned about in the previous section if you wanted to format the output more granularly. This what you get from running this code: < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > < 'Sheet 1 - Books' > # output truncated for brevity <Cell 'Sheet 1 - Books'.A1> <Cell 'Sheet 1 - Books'.A2> <Cell 'Sheet 1 - Books'.A3> <Cell 'Sheet 1 - Books'.A4> <Cell 'Sheet 1 - Books'.A5> <Cell 'Sheet 1 - Books'.A6> <Cell 'Sheet 1 - Books'.A7> <Cell 'Sheet 1 - Books'.A8> <Cell 'Sheet 1 - Books'.A9> <Cell 'Sheet 1 - Books'.A10> # output truncated for brevity <Cell 'Sheet 1 - Books'.A1> <Cell 'Sheet 1 - Books'.A2> <Cell 'Sheet 1 - Books'.A3> <Cell 'Sheet 1 - Books'.A4> <Cell 'Sheet 1 - Books'.A5> <Cell 'Sheet 1 - Books'.A6> <Cell 'Sheet 1 - Books'.A7> <Cell 'Sheet 1 - Books'.A8> <Cell 'Sheet 1 - Books'.A9> <Cell 'Sheet 1 - Books'.A10> # output truncated for brevity The output is truncated as it will print out quite a few cells by default. OpenPyXL provides other ways to iterate over rows and columns by using the iter_rows() and iter_cols() functions. These methods accept several arguments:
      min_row max_row min_col max_col
    You can also add on a values_only argument that tells OpenPyXL to return the value of the cell instead of the cell object. Go ahead and create a new file named iterating_over_cell_values.py and add this code to it: # iterating_over_cell_values.py from openpyxl import def iterating_over_values () load_workbook() for in iter_rows ( 1 3 1 3 True ) print() if __name__ '__main__' iterating_over_values ( 'books.xlsx') # iterating_over_cell_values.py from openpyxl import load_workbook def iterating_over_values(path): workbook = load_workbook(filename=path) sheet = workbook.active for value in sheet.iter_rows( min_row=1, max_row=3, min_col=1, max_col=3, values_only=True, ): print(value) if __name__ == '__main__': iterating_over_values('books.xlsx') # iterating_over_cell_values.py from openpyxl import load_workbook def iterating_over_values(path): workbook = load_workbook(filename=path) sheet = workbook.active for value in sheet.iter_rows( min_row=1, max_row=3, min_col=1, max_col=3, values_only=True, ): print(value) if __name__ == '__main__': iterating_over_values('books.xlsx') This code demonstrates how you can use the iter_rows() to iterate over the rows in the Excel spreadsheet and print out the values of those rows. When you run this code, you will get the following output: ( 'Books') ( 'Title' 'Author' 'Publisher') ( 'Python 101' 'Mike Driscoll' 'Mouse vs Python') ('Books', None, None) ('Title', 'Author', 'Publisher') ('Python 101', 'Mike Driscoll', 'Mouse vs Python') ('Books', None, None) ('Title', 'Author', 'Publisher') ('Python 101', 'Mike Driscoll', 'Mouse vs Python') The output is a Python tuple that contains the data within each column. At this point you have learned how to open spreadsheets and read data — both from specific cells, as well as through iteration. You are now ready to learn how to use OpenPyXL to create Excel spreadsheets!

    Writing Excel Spreadsheets


    Creating an Excel spreadsheet using OpenPyXL doesn’t take a lot of code. You can create a spreadsheet by using the Workbook() class. Go ahead and create a new file named writing_hello.py and add this code to it: # writing_hello.py from openpyxl import def create_workbook() Workbook() [ 'A1' ] 'Hello' [ 'A2' ] 'from' [ 'A3' ] 'OpenPyXL' save () if __name__ '__main__' create_workbook( 'hello.xlsx') # writing_hello.py from openpyxl import Workbook def create_workbook(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['A2'] = 'from' sheet['A3'] = 'OpenPyXL' workbook.save(path) if __name__ == '__main__': create_workbook('hello.xlsx') # writing_hello.py from openpyxl import Workbook def create_workbook(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['A2'] = 'from' sheet['A3'] = 'OpenPyXL' workbook.save(path) if __name__ == '__main__': create_workbook('hello.xlsx') Here you instantiate Workbook() and get the active sheet. Then you set the first three rows in column “A” to different strings. Finally, you call save() and pass it the path to save the new document to. Congratulations! You have just created an Excel spreadsheet with Python. Let’s discover how to add and remove sheets in your Workbook next!

    Adding and Removing Sheets


    Many people like to organize their data across multiple Worksheets within the Workbook. OpenPyXL supports the ability to add new sheets to a Workbook() object via its create_sheet() method. Create a new file named creating_sheets.py and add this code to it: # creating_sheets.py import def create_worksheets () Workbook() print() # Add a new worksheet create_sheet () print() # Insert a worksheet create_sheet ( 1 'Second sheet') print() save () if __name__ '__main__' create_worksheets ( 'sheets.xlsx') # creating_sheets.py import openpyxl def create_worksheets(path): workbook = openpyxl.Workbook() print(workbook.sheetnames) # Add a new worksheet workbook.create_sheet() print(workbook.sheetnames) # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': create_worksheets('sheets.xlsx') # creating_sheets.py import openpyxl def create_worksheets(path): workbook = openpyxl.Workbook() print(workbook.sheetnames) # Add a new worksheet workbook.create_sheet() print(workbook.sheetnames) # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': create_worksheets('sheets.xlsx') Here you use create_sheet() twice to add two new Worksheets to the Workbook. The second example shows you how to set the title of a sheet and at which index to insert the sheet. The argument index=1 means that the worksheet will be added after the first existing worksheet, since they are indexed starting at 0. When you run this code, you will see the following output: [ 'Sheet' ] [ 'Sheet' 'Sheet1' ] [ 'Sheet' 'Second sheet' 'Sheet1' ] ['Sheet'] ['Sheet', 'Sheet1'] ['Sheet', 'Second sheet', 'Sheet1'] ['Sheet'] ['Sheet', 'Sheet1'] ['Sheet', 'Second sheet', 'Sheet1'] You can see that the new sheets have been added step-by-step to your Workbook. After saving the file, you can verify that there are multiple Worksheets by opening Excel or another Excel-compatible application. After this automated worksheet-creation process, you’ve suddenly got too many sheets, so let’s get rid of some. There are two ways to remove a sheet. Go ahead and create delete_sheets.py to see how to use Python’s del keyword for removing worksheets: # delete_sheets.py import def create_worksheets () Workbook() create_sheet () # Insert a worksheet create_sheet ( 1 'Second sheet') print() del [ 'Second sheet' ] print() save () if __name__ '__main__' create_worksheets ( 'del_sheets.xlsx') # delete_sheets.py import openpyxl def create_worksheets(path): workbook = openpyxl.Workbook() workbook.create_sheet() # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) del workbook['Second sheet'] print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': create_worksheets('del_sheets.xlsx') # delete_sheets.py import openpyxl def create_worksheets(path): workbook = openpyxl.Workbook() workbook.create_sheet() # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) del workbook['Second sheet'] print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': create_worksheets('del_sheets.xlsx') This code will create a new Workbook and then add two new Worksheets to it. Then it uses Python’s del keyword to delete workbook['Second sheet']. You can verify that it worked as expected by looking at the print-out of the sheet list before and after the del command: [ 'Sheet' 'Second sheet' 'Sheet1' ] [ 'Sheet' 'Sheet1' ] ['Sheet', 'Second sheet', 'Sheet1'] ['Sheet', 'Sheet1'] ['Sheet', 'Second sheet', 'Sheet1'] ['Sheet', 'Sheet1'] The other way to delete a sheet from a Workbook is to use the remove() method. Create a new file called remove_sheets.py and enter this code to learn how that works: # remove_sheets.py def remove_worksheets () Workbook() create_sheet () # Insert a worksheet create_sheet ( 1 'Second sheet') print(sheetnames) remove () print(sheetnames) save () if '__main__' remove_worksheets ( 'remove_sheets.xlsx') # remove_sheets.py import openpyxl def remove_worksheets(path): workbook = openpyxl.Workbook() sheet1 = workbook.create_sheet() # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) workbook.remove(sheet1) print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': remove_worksheets('remove_sheets.xlsx') # remove_sheets.py import openpyxl def remove_worksheets(path): workbook = openpyxl.Workbook() sheet1 = workbook.create_sheet() # Insert a worksheet workbook.create_sheet(index=1, title='Second sheet') print(workbook.sheetnames) workbook.remove(sheet1) print(workbook.sheetnames) workbook.save(path) if __name__ == '__main__': remove_worksheets('remove_sheets.xlsx') This time around, you hold onto a reference to the first Worksheet that you create by assigning the result to sheet1. Then you remove it later on in the code. Alternatively, you could also remove that sheet by using the same syntax as before, like this: remove ( [ 'Sheet1' ]) workbook.remove(workbook['Sheet1']) workbook.remove(workbook['Sheet1']) No matter which method you choose for removing the Worksheet, the output will be the same: [ 'Sheet' 'Second sheet' 'Sheet1' ] [ 'Sheet' 'Second sheet' ] ['Sheet', 'Second sheet', 'Sheet1'] ['Sheet', 'Second sheet'] ['Sheet', 'Second sheet', 'Sheet1'] ['Sheet', 'Second sheet'] Now let’s move on and learn how you can add and remove rows and columns.

    Adding and Deleting Rows and Columns


    OpenPyXL has several useful methods that you can use for adding and removing rows and columns in your spreadsheet. Here is a list of the four methods you will learn about in this section:
      .insert_rows() .delete_rows() .insert_cols() .delete_cols()
    Each of these methods can take two arguments:
      idx – The index to insert the row or column amount – The number of rows or columns to add
    To see how this works, create a file named insert_demo.py and add the following code to it: # insert_demo.py from openpyxl import def inserting_cols_rows () Workbook() [ 'A1' ] 'Hello' [ 'A2' ] 'from' [ 'A3' ] 'OpenPyXL' # insert a column before A insert_cols ( 1) # insert 2 rows starting on the second row insert_rows ( 2 2) save () if __name__ '__main__' inserting_cols_rows ( 'inserting.xlsx') # insert_demo.py from openpyxl import Workbook def inserting_cols_rows(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['A2'] = 'from' sheet['A3'] = 'OpenPyXL' # insert a column before A sheet.insert_cols(idx=1) # insert 2 rows starting on the second row sheet.insert_rows(idx=2, amount=2) workbook.save(path) if __name__ == '__main__': inserting_cols_rows('inserting.xlsx') # insert_demo.py from openpyxl import Workbook def inserting_cols_rows(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['A2'] = 'from' sheet['A3'] = 'OpenPyXL' # insert a column before A sheet.insert_cols(idx=1) # insert 2 rows starting on the second row sheet.insert_rows(idx=2, amount=2) workbook.save(path) if __name__ == '__main__': inserting_cols_rows('inserting.xlsx') Here you create a Worksheet and insert a new column before column “A”. Columns are indexed started at 1 while in contrast, worksheets start at 0. This effectively moves all the cells in column A to column B. Then you insert two new rows starting on row 2. Now that you know how to insert columns and rows, it is time for you to discover how to remove them. To find out how to remove columns or rows, create a new file named delete_demo.py and add this code: # delete_demo.py from openpyxl import def deleting_cols_rows () Workbook() [ 'A1' ] 'Hello' [ 'B1' ] 'from' [ 'C1' ] 'OpenPyXL' [ 'A2' ] 'row 2' [ 'A3' ] 'row 3' [ 'A4' ] 'row 4' # Delete column A delete_cols ( 1) # delete 2 rows starting on the second row delete_rows ( 2 2) save () if __name__ '__main__' deleting_cols_rows ( 'deleting.xlsx') # delete_demo.py from openpyxl import Workbook def deleting_cols_rows(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['B1'] = 'from' sheet['C1'] = 'OpenPyXL' sheet['A2'] = 'row 2' sheet['A3'] = 'row 3' sheet['A4'] = 'row 4' # Delete column A sheet.delete_cols(idx=1) # delete 2 rows starting on the second row sheet.delete_rows(idx=2, amount=2) workbook.save(path) if __name__ == '__main__': deleting_cols_rows('deleting.xlsx') # delete_demo.py from openpyxl import Workbook def deleting_cols_rows(path): workbook = Workbook() sheet = workbook.active sheet['A1'] = 'Hello' sheet['B1'] = 'from' sheet['C1'] = 'OpenPyXL' sheet['A2'] = 'row 2' sheet['A3'] = 'row 3' sheet['A4'] = 'row 4' # Delete column A sheet.delete_cols(idx=1) # delete 2 rows starting on the second row sheet.delete_rows(idx=2, amount=2) workbook.save(path) if __name__ == '__main__': deleting_cols_rows('deleting.xlsx') This code creates text in several cells and then removes column A using delete_cols(). It also removes two rows starting on the 2nd row via delete_rows(). Being able to add and remove columns and rows can be quite useful when it comes to organizing your data.

    A Guide to Excel Spreadsheets in Python With openpyxl

    Getting Started

    Python Packages for Excel 1. openpyxl 2. xlrd 3. xlsxwriter 4. xlwt 5. xlutils $ pip install openpyxl from openpyxl import Workbook workbook = Workbook() sheet = workbook.active sheet["A1"] = "hello" sheet["B1"] = "world!" workbook.save(filename="hello_world.xlsx") your first spreadsheet created! Download Dataset: Click here to download the dataset for the openpyxl exercise you'll be following in this tutorial.

    A Simple Approach to Reading an Excel Spreadsheet

    >>> >>> from openpyxl import load_workbook >>> workbook = load_workbook(filename="sample.xlsx") >>> workbook.sheetnames ['Sheet 1'] >>> sheet = workbook.active >>> sheet <Worksheet "Sheet 1"> >>> sheet.title 'Sheet 1' In the code above, you first open the spreadsheet sample.xlsx using load_workbook(), and then you can use workbook.sheetnames to see all the sheets you have available to work with. After that, workbook.active selects the first available sheet and, in this case, you can see that it selects Sheet 1 automatically. Using these methods is the default way of opening a spreadsheet, and you'll see it many times during this tutorial. Now, after opening a spreadsheet, you can easily retrieve data from it like this: >>> >>> sheet["A1"] <Cell 'Sheet 1'.A1> >>> sheet["A1"].value 'marketplace' >>> sheet["F10"].value "G-Shock Men's Grey Sport Watch" To return the actual value of a cell, you need to do .value. Otherwise, you'll get the main Cell object. You can also use the method .cell() to retrieve a cell using index notation. Remember to add .value to get the actual value and not a Cell object: >>> >>> sheet.cell(row=10, column=6) <Cell 'Sheet 1'.F10> >>> sheet.cell(row=10, column=6).value "G-Shock Men's Grey Sport Watch" You can see that the results returned are the same, no matter which way you decide to go with. However, in this tutorial, you'll be mostly using the first approach: ["A1"]. Note: Even though in Python you're used to a zero-indexed notation, with spreadsheets you'll always use a one-indexed notation where the first row or column always has index 1. The above shows you the quickest way to open a spreadsheet. However, you can pass additional parameters to change the way a spreadsheet is loaded.

    Additional Reading Options

    There are a few arguments you can pass to load_workbook() that change the way a spreadsheet is loaded. The most important ones are the following two Booleans:
    1. read_only loads a spreadsheet in read-only mode allowing you to open very large Excel files.
    2. data_only ignores loading formulas and instead loads only the resulting values.

    Importing Data From a Spreadsheet

    Now that you've learned the basics about loading a spreadsheet, it's about time you get to the fun part: the iteration and actual usage of the values within the spreadsheet. This section is where you'll learn all the different ways you can iterate through the data, but also how to convert that data into something usable and, more importantly, how to do it in a Pythonic way.

    Iterating Through the Data

    There are a few different ways you can iterate through the data depending on your needs. You can slice the data with a combination of columns and rows: >>> >>> sheet["A1:C2"] ((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>), (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>)) You can get ranges of rows or columns: >>> >>> # Get all cells from column A >>> sheet["A"] (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>, ... <Cell 'Sheet 1'.A99>, <Cell 'Sheet 1'.A100>) >>> # Get all cells for a range of columns >>> sheet["A:B"] ((<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>, ... <Cell 'Sheet 1'.A99>, <Cell 'Sheet 1'.A100>), (<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>, ... <Cell 'Sheet 1'.B99>, <Cell 'Sheet 1'.B100>)) >>> # Get all cells from row 5 >>> sheet[5] (<Cell 'Sheet 1'.A5>, <Cell 'Sheet 1'.B5>, ... <Cell 'Sheet 1'.N5>, <Cell 'Sheet 1'.O5>) >>> # Get all cells for a range of rows >>> sheet[5:6] ((<Cell 'Sheet 1'.A5>, <Cell 'Sheet 1'.B5>, ... <Cell 'Sheet 1'.N5>, <Cell 'Sheet 1'.O5>), (<Cell 'Sheet 1'.A6>, <Cell 'Sheet 1'.B6>, ... <Cell 'Sheet 1'.N6>, <Cell 'Sheet 1'.O6>)) You'll notice that all of the above examples return a tuple. If you want to refresh your memory on how to handle tuples in Python, check out the article on Lists and Tuples in Python. There are also multiple ways of using normal Python generators to go through the data. The main methods you can use to achieve this are:
    • .iter_rows()
    • .iter_cols()
    Both methods can receive the following arguments:
    • min_row
    • max_row
    • min_col
    • max_col
    These arguments are used to set boundaries for the iteration: >>> >>> for row in sheet.iter_rows(min_row=1, ... max_row=2, ... min_col=1, ... max_col=3): ... print(row) (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1>) (<Cell 'Sheet 1'.A2>, <Cell 'Sheet 1'.B2>, <Cell 'Sheet 1'.C2>) >>> for column in sheet.iter_cols(min_row=1, ... max_row=2, ... min_col=1, ... max_col=3): ... print(column) (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.A2>) (<Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.B2>) (<Cell 'Sheet 1'.C1>, <Cell 'Sheet 1'.C2>) You'll notice that in the first example, when iterating through the rows using .iter_rows(), you get one tuple element per row selected. While when using .iter_cols() and iterating through columns, you'll get one tuple per column instead. One additional argument you can pass to both methods is the Boolean values_only. When it's set to True, the values of the cell are returned, instead of the Cell object: >>> >>> for value in sheet.iter_rows(min_row=1, ... max_row=2, ... min_col=1, ... max_col=3, ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id') ('US', 3653882, 'R3O9SGZBVQBV76') If you want to iterate through the whole dataset, then you can also use the attributes .rows or .columns directly, which are shortcuts to using .iter_rows() and .iter_cols() without any arguments: >>> >>> for row in sheet.rows: ... print(row) (<Cell 'Sheet 1'.A1>, <Cell 'Sheet 1'.B1>, <Cell 'Sheet 1'.C1> ... <Cell 'Sheet 1'.M100>, <Cell 'Sheet 1'.N100>, <Cell 'Sheet 1'.O100>) These shortcuts are very useful when you're iterating through the whole dataset.

    Manipulate Data Using Python's Default Data Structures

    Now that you know the basics of iterating through the data in a workbook, let's look at smart ways of converting that data into Python structures. As you saw earlier, the result from all iterations comes in the form of tuples. However, since a tuple is nothing more than a list that's immutable, you can easily access its data and transform it into other structures. For example, say you want to extract product information from the sample.xlsx spreadsheet and into a dictionary where each key is a product ID. A straightforward way to do this is to iterate over all the rows, pick the columns you know are related to product information, and then store that in a dictionary. Let's code this out! First of all, have a look at the headers and see what information you care most about: >>> >>> for value in sheet.iter_rows(min_row=1, ... max_row=1, ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id', 'product_id', ...) This code returns a list of all the column names you have in the spreadsheet. To start, grab the columns with names:
    • product_id
    • product_parent
    • product_title
    • product_category
    Lucky for you, the columns you need are all next to each other so you can use the min_column and max_column to easily get the data you want: >>> >>> for value in sheet.iter_rows(min_row=2, ... min_col=4, ... max_col=7, ... values_only=True): ... print(value) ('B00FALQ1ZC', 937001370, 'Invicta Women\'s 15150 "Angel" 18k Yellow...) ('B00D3RGO20', 484010722, "Kenneth Cole New York Women's KC4944...) ... Nice! Now that you know how to get all the important product information you need, let's put that data into a dictionary: import json from openpyxl import load_workbook workbook = load_workbook(filename="sample.xlsx") sheet = workbook.active products = {} # Using the values_only because you want to return the cells' values for row in sheet.iter_rows(min_row=2, min_col=4, max_col=7, values_only=True): product_id = row[0] product = { "parent": row[1], "title": row[2], "category": row[3] } products[product_id] = product # Using json here to be able to format the output for displaying later print(json.dumps(products)) The code above returns a JSON similar to this: { "B00FALQ1ZC": { "parent": 937001370, "title": "Invicta Women's 15150 ...", "category": "Watches" }, "B00D3RGO20": { "parent": 484010722, "title": "Kenneth Cole New York ...", "category": "Watches" } } Here you can see that the output is trimmed to 2 products only, but if you run the script as it is, then you should get 98 products.

    Convert Data Into Python Classes

    To finalize the reading section of this tutorial, let's dive into Python classes and see how you could improve on the example above and better structure the data. For this, you'll be using the new Python Data Classes that are available from Python 3.7. If you're using an older version of Python, then you can use the default Classes instead. So, first things first, let's look at the data you have and decide what you want to store and how you want to store it. As you saw right at the start, this data comes from Amazon, and it's a list of product reviews. You can check the list of all the columns and their meaning on Amazon. There are two significant elements you can extract from the data available:
    1. Products
    2. Reviews
    A Product has:
    • ID
    • Title
    • Parent
    • Category
    The Review has a few more fields:
    • ID
    • Customer ID
    • Stars
    • Headline
    • Body
    • Date
    You can ignore a few of the review fields to make things a bit simpler. So, a straightforward implementation of these two classes could be written in a separate file classes.py: import datetime from dataclasses import dataclass @dataclass class Product: id: str parent: str title: str category: str @dataclass class Review: id: str customer_id: str stars: int headline: str body: str date: datetime.datetime After defining your data classes, you need to convert the data from the spreadsheet into these new structures. Before doing the conversion, it's worth looking at our header again and creating a mapping between columns and the fields you need: >>> >>> for value in sheet.iter_rows(min_row=1, ... max_row=1, ... values_only=True): ... print(value) ('marketplace', 'customer_id', 'review_id', 'product_id', ...) >>> # Or an alternative >>> for cell in sheet[1]: ... print(cell.value) marketplace customer_id review_id product_id product_parent ... Let's create a file mapping.py where you have a list of all the field names and their column location (zero-indexed) on the spreadsheet: # Product fields PRODUCT_ID = 3 PRODUCT_PARENT = 4 PRODUCT_TITLE = 5 PRODUCT_CATEGORY = 6 # Review fields REVIEW_ID = 2 REVIEW_CUSTOMER = 1 REVIEW_STARS = 7 REVIEW_HEADLINE = 12 REVIEW_BODY = 13 REVIEW_DATE = 14 You don't necessarily have to do the mapping above. It's more for readability when parsing the row data, so you don't end up with a lot of magic numbers lying around. Finally, let's look at the code needed to parse the spreadsheet data into a list of product and review objects: from datetime import datetime from openpyxl import load_workbook from classes import Product, Review from mapping import PRODUCT_ID, PRODUCT_PARENT, PRODUCT_TITLE, \ PRODUCT_CATEGORY, REVIEW_DATE, REVIEW_ID, REVIEW_CUSTOMER, \ REVIEW_STARS, REVIEW_HEADLINE, REVIEW_BODY # Using the read_only method since you're not gonna be editing the spreadsheet workbook = load_workbook(filename="sample.xlsx", read_only=True) sheet = workbook.active products = [] reviews = [] # Using the values_only because you just want to return the cell value for row in sheet.iter_rows(min_row=2, values_only=True): product = Product(id=row[PRODUCT_ID], parent=row[PRODUCT_PARENT], title=row[PRODUCT_TITLE], category=row[PRODUCT_CATEGORY]) products.append(product) # You need to parse the date from the spreadsheet into a datetime format spread_date = row[REVIEW_DATE] parsed_date = datetime.strptime(spread_date, "%Y-%m-%d") review = Review(id=row[REVIEW_ID], customer_id=row[REVIEW_CUSTOMER], stars=row[REVIEW_STARS], headline=row[REVIEW_HEADLINE], body=row[REVIEW_BODY], date=parsed_date) reviews.append(review) print(products[0]) print(reviews[0]) After you run the code above, you should get some output like this: Product(id='B00FALQ1ZC', parent=937001370, ...) Review(id='R3O9SGZBVQBV76', customer_id=3653882, ...) That's it! Now you should have the data in a very simple and digestible class format, and you can start thinking of storing this in a Database or any other type of data storage you like. Using this kind of OOP strategy to parse spreadsheets makes handling the data much simpler later on.

    Appending New Data

    Before you start creating very complex spreadsheets, have a quick look at an example of how to append data to an existing spreadsheet. Go back to the first example spreadsheet you created ( hello_world.xlsx) and try opening it and appending some data to it, like this: from openpyxl import load_workbook # Start by opening the spreadsheet and selecting the main sheet workbook = load_workbook(filename="hello_world.xlsx") sheet = workbook.active # Write what you want into a specific cell sheet["C1"] = "writing ;)" # Save the spreadsheet workbook.save(filename="hello_world_append.xlsx") Et voilà, if you open the new hello_world_append.xlsx spreadsheet, you'll see the following change: Notice the additional writing ;) on cell C1.

    Writing Excel Spreadsheets With openpyxl

    There are a lot of different things you can write to a spreadsheet, from simple text or number values to complex formulas, charts, or even images. Let's start creating some spreadsheets!

    Creating a Simple Spreadsheet

    Previously, you saw a very quick example of how to write “Hello world!” into a spreadsheet, so you can start with that: 1from openpyxl import Workbook 2 3filename = "hello_world.xlsx" 4 5workbook = Workbook() 6sheet = workbook.active 7 8sheet["A1"] = "hello" 9sheet["B1"] = "world!" 10 11workbook.save(filename=filename) The highlighted lines in the code above are the most important ones for writing. In the code, you can see that:
    • Line 5 shows you how to create a new empty workbook.
    • Lines 8 and 9 show you how to add data to specific cells.
    • Line 11 shows you how to save the spreadsheet when you're done.
    Even though these lines above can be straightforward, it's still good to know them well for when things get a bit more complicated. Note: You'll be using the hello_world.xlsx spreadsheet for some of the upcoming examples, so keep it handy. One thing you can do to help with coming code examples is add the following method to your Python file or console: >>> >>> def print_rows(): ... for row in sheet.iter_rows(values_only=True): ... print(row) It makes it easier to print all of your spreadsheet values by just calling print_rows().

    Basic Spreadsheet Operations

    Before you get into the more advanced topics, it's good for you to know how to manage the most simple elements of a spreadsheet.

    Adding and Updating Cell Values

    You already learned how to add values to a spreadsheet like this: >>> >>> sheet["A1"] = "value" There's another way you can do this, by first selecting a cell and then changing its value: >>> >>> cell = sheet["A1"] >>> cell <Cell 'Sheet'.A1> >>> cell.value 'hello' >>> cell.value = "hey" >>> cell.value 'hey' The new value is only stored into the spreadsheet once you call workbook.save(). The openpyxl creates a cell when adding a value, if that cell didn't exist before: >>> >>> # Before, our spreadsheet has only 1 row >>> print_rows() ('hello', 'world!') >>> # Try adding a value to row 10 >>> sheet["B10"] = "test" >>> print_rows() ('hello', 'world!') (None, None) (None, None) (None, None) (None, None) (None, None) (None, None) (None, None) (None, None) (None, 'test') As you can see, when trying to add a value to cell B10, you end up with a tuple with 10 rows, just so you can have that test value.

    Managing Rows and Columns

    One of the most common things you have to do when manipulating spreadsheets is adding or removing rows and columns. The openpyxl package allows you to do that in a very straightforward way by using the methods:
    • .insert_rows()
    • .delete_rows()
    • .insert_cols()
    • .delete_cols()
    Every single one of those methods can receive two arguments:
    1. idx
    2. amount
    Using our basic hello_world.xlsx example again, let's see how these methods work: >>> >>> print_rows() ('hello', 'world!') >>> # Insert a column before the existing column 1 ("A") >>> sheet.insert_cols(idx=1) >>> print_rows() (None, 'hello', 'world!') >>> # Insert 5 columns between column 2 ("B") and 3 ("C") >>> sheet.insert_cols(idx=3, amount=5) >>> print_rows() (None, 'hello', None, None, None, None, None, 'world!') >>> # Delete the created columns >>> sheet.delete_cols(idx=3, amount=5) >>> sheet.delete_cols(idx=1) >>> print_rows() ('hello', 'world!') >>> # Insert a new row in the beginning >>> sheet.insert_rows(idx=1) >>> print_rows() (None, None) ('hello', 'world!') >>> # Insert 3 new rows in the beginning >>> sheet.insert_rows(idx=1, amount=3) >>> print_rows() (None, None) (None, None) (None, None) (None, None) ('hello', 'world!') >>> # Delete the first 4 rows >>> sheet.delete_rows(idx=1, amount=4) >>> print_rows() ('hello', 'world!') The only thing you need to remember is that when inserting new data (rows or columns), the insertion happens before the idx parameter. So, if you do insert_rows(1), it inserts a new row before the existing first row. It's the same for columns: when you call insert_cols(2), it inserts a new column right before the already existing second column ( B). However, when deleting rows or columns, .delete_... deletes data starting from the index passed as an argument. For example, when doing delete_rows(2) it deletes row 2, and when doing delete_cols(3) it deletes the third column ( C).

    Managing Sheets

    Sheet management is also one of those things you might need to know, even though it might be something that you don't use that often. If you look back at the code examples from this tutorial, you'll notice the following recurring piece of code: sheet = workbook.active This is the way to select the default sheet from a spreadsheet. However, if you're opening a spreadsheet with multiple sheets, then you can always select a specific one like this: >>> >>> # Let's say you have two sheets: "Products" and "Company Sales" >>> workbook.sheetnames ['Products', 'Company Sales'] >>> # You can select a sheet using its title >>> products_sheet = workbook["Products"] >>> sales_sheet = workbook["Company Sales"] You can also change a sheet title very easily: >>> >>> workbook.sheetnames ['Products', 'Company Sales'] >>> products_sheet = workbook["Products"] >>> products_sheet.title = "New Products" >>> workbook.sheetnames ['New Products', 'Company Sales'] If you want to create or delete sheets, then you can also do that with .create_sheet() and .remove(): >>> >>> workbook.sheetnames ['Products', 'Company Sales'] >>> operations_sheet = workbook.create_sheet("Operations") >>> workbook.sheetnames ['Products', 'Company Sales', 'Operations'] >>> # You can also define the position to create the sheet at >>> hr_sheet = workbook.create_sheet("HR", 0) >>> workbook.sheetnames ['HR', 'Products', 'Company Sales', 'Operations'] >>> # To remove them, just pass the sheet as an argument to the .remove() >>> workbook.remove(operations_sheet) >>> workbook.sheetnames ['HR', 'Products', 'Company Sales'] >>> workbook.remove(hr_sheet) >>> workbook.sheetnames ['Products', 'Company Sales'] One other thing you can do is make duplicates of a sheet using copy_worksheet(): >>> >>> workbook.sheetnames ['Products', 'Company Sales'] >>> products_sheet = workbook["Products"] >>> workbook.copy_worksheet(products_sheet) <Worksheet "Products Copy"> >>> workbook.sheetnames ['Products', 'Company Sales', 'Products Copy'] If you open your spreadsheet after saving the above code, you'll notice that the sheet Products Copy is a duplicate of the sheet Products.

    Freezing Rows and Columns

    Something that you might want to do when working with big spreadsheets is to freeze a few rows or columns, so they remain visible when you scroll right or down. Freezing data allows you to keep an eye on important rows or columns, regardless of where you scroll in the spreadsheet. Again, openpyxl also has a way to accomplish this by using the worksheet freeze_panes attribute. For this example, go back to our sample.xlsx spreadsheet and try doing the following: >>> >>> workbook = load_workbook(filename="sample.xlsx") >>> sheet = workbook.active >>> sheet.freeze_panes = "C2" >>> workbook.save("sample_frozen.xlsx") If you open the sample_frozen.xlsx spreadsheet in your favorite spreadsheet editor, you'll notice that row 1 and columns A and B are frozen and are always visible no matter where you navigate within the spreadsheet. This feature is handy, for example, to keep headers within sight, so you always know what each column represents. Here's how it looks in the editor: Notice how you're at the end of the spreadsheet, and yet, you can see both row 1 and columns A and B.

    Adding Filters

    You can use openpyxl to add filters and sorts to your spreadsheet. However, when you open the spreadsheet, the data won't be rearranged according to these sorts and filters. At first, this might seem like a pretty useless feature, but when you're programmatically creating a spreadsheet that is going to be sent and used by somebody else, it's still nice to at least create the filters and allow people to use it afterward. The code below is an example of how you would add some filters to our existing sample.xlsx spreadsheet: >>> >>> # Check the used spreadsheet space using the attribute "dimensions" >>> sheet.dimensions 'A1:O100' >>> sheet.auto_filter.ref = "A1:O100" >>> workbook.save(filename="sample_with_filters.xlsx") You should now see the filters created when opening the spreadsheet in your editor: You don't have to use sheet.dimensions if you know precisely which part of the spreadsheet you want to apply filters to.

    Adding Formulas

    Formulas (or formulae) are one of the most powerful features of spreadsheets. They gives you the power to apply specific mathematical equations to a range of cells. Using formulas with openpyxl is as simple as editing the value of a cell. You can see the list of formulas supported by openpyxl: >>> >>> from openpyxl.utils import FORMULAE >>> FORMULAE frozenset({'ABS', 'ACCRINT', 'ACCRINTM', 'ACOS', 'ACOSH', 'AMORDEGRC', 'AMORLINC', 'AND', ... 'YEARFRAC', 'YIELD', 'YIELDDISC', 'YIELDMAT', 'ZTEST'}) Let's add some formulas to our sample.xlsx spreadsheet. Starting with something easy, let's check the average star rating for the 99 reviews within the spreadsheet: >>> >>> # Star rating is column "H" >>> sheet["P2"] = "=AVERAGE(H2:H100)" >>> workbook.save(filename="sample_formulas.xlsx") If you open the spreadsheet now and go to cell P2, you should see that its value is: 4.18181818181818. Have a look in the editor: You can use the same methodology to add any formulas to your spreadsheet. For example, let's count the number of reviews that had helpful votes: >>> >>> # The helpful votes are counted on column "I" >>> sheet["P3"] = '=COUNTIF(I2:I100, ">0")' >>> workbook.save(filename="sample_formulas.xlsx") You should get the number 21 on your P3 spreadsheet cell like so: You'll have to make sure that the strings within a formula are always in double quotes, so you either have to use single quotes around the formula like in the example above or you'll have to escape the double quotes inside the formula: "=COUNTIF(I2:I100, \">0\")". There are a ton of other formulas you can add to your spreadsheet using the same procedure you tried above. Give it a go yourself!

    Adding Styles

    Even though styling a spreadsheet might not be something you would do every day, it's still good to know how to do it. Using openpyxl, you can apply multiple styling options to your spreadsheet, including fonts, borders, colors, and so on. Have a look at the openpyxl documentation to learn more. You can also choose to either apply a style directly to a cell or create a template and reuse it to apply styles to multiple cells. Let's start by having a look at simple cell styling, using our sample.xlsx again as the base spreadsheet: >>> >>> # Import necessary style classes >>> from openpyxl.styles import Font, Color, Alignment, Border, Side >>> # Create a few styles >>> bold_font = Font(bold=True) >>> big_red_text = Font(color="00FF0000", size=20) >>> center_aligned_text = Alignment(horizontal="center") >>> double_border_side = Side(border_style="double") >>> square_border = Border(top=double_border_side, ... right=double_border_side, ... bottom=double_border_side, ... left=double_border_side) >>> # Style some cells! >>> sheet["A2"].font = bold_font >>> sheet["A3"].font = big_red_text >>> sheet["A4"].alignment = center_aligned_text >>> sheet["A5"].border = square_border >>> workbook.save(filename="sample_styles.xlsx") If you open your spreadsheet now, you should see quite a few different styles on the first 5 cells of column A: There you go. You got:
    • A2 with the text in bold
    • A3 with the text in red and bigger font size
    • A4 with the text centered
    • A5 with a square border around the text
    Note: For the colors, you can also use HEX codes instead by doing Font(color="C70E0F"). You can also combine styles by simply adding them to the cell at the same time: >>> >>> # Reusing the same styles from the example above >>> sheet["A6"].alignment = center_aligned_text >>> sheet["A6"].font = big_red_text >>> sheet["A6"].border = square_border >>> workbook.save(filename="sample_styles.xlsx") Have a look at cell A6 here: When you want to apply multiple styles to one or several cells, you can use a NamedStyle class instead, which is like a style template that you can use over and over again. Have a look at the example below: >>> >>> from openpyxl.styles import NamedStyle >>> # Let's create a style template for the header row >>> header = NamedStyle(name="header") >>> header.font = Font(bold=True) >>> header.border = Border(bottom=Side(border_style="thin")) >>> header.alignment = Alignment(horizontal="center", vertical="center") >>> # Now let's apply this to all first row (header) cells >>> header_row = sheet[1] >>> for cell in header_row: ... cell.style = header >>> workbook.save(filename="sample_styles.xlsx") If you open the spreadsheet now, you should see that its first row is bold, the text is aligned to the center, and there's a small bottom border! Have a look below: As you saw above, there are many options when it comes to styling, and it depends on the use case, so feel free to check openpyxl documentation and see what other things you can do.

    Conditional Formatting

    This feature is one of my personal favorites when it comes to adding styles to a spreadsheet. It's a much more powerful approach to styling because it dynamically applies styles according to how the data in the spreadsheet changes. In a nutshell, conditional formatting allows you to specify a list of styles to apply to a cell (or cell range) according to specific conditions. For example, a widespread use case is to have a balance sheet where all the negative totals are in red, and the positive ones are in green. This formatting makes it much more efficient to spot good vs bad periods. Without further ado, let's pick our favorite spreadsheet— sample.xlsx—and add some conditional formatting. You can start by adding a simple one that adds a red background to all reviews with less than 3 stars: >>> >>> from openpyxl.styles import PatternFill >>> from openpyxl.styles.differential import DifferentialStyle >>> from openpyxl.formatting.rule import Rule >>> red_background = PatternFill(fgColor="00FF0000") >>> diff_style = DifferentialStyle(fill=red_background) >>> rule = Rule(type="expression", dxf=diff_style) >>> rule.formula = ["$H1<3"] >>> sheet.conditional_formatting.add("A1:O100", rule) >>> workbook.save("sample_conditional_formatting.xlsx") Now you'll see all the reviews with a star rating below 3 marked with a red background: Code-wise, the only things that are new here are the objects DifferentialStyle and Rule:
    • DifferentialStyle is quite similar to NamedStyle, which you already saw above, and it's used to aggregate multiple styles such as fonts, borders, alignment, and so forth.
    • Rule is responsible for selecting the cells and applying the styles if the cells match the rule's logic.
    Using a Rule object, you can create numerous conditional formatting scenarios. However, for simplicity sake, the openpyxl package offers 3 built-in formats that make it easier to create a few common conditional formatting patterns. These built-ins are:
    • ColorScale
    • IconSet
    • DataBar
    The ColorScale gives you the ability to create color gradients: >>> >>> from openpyxl.formatting.rule import ColorScaleRule >>> color_scale_rule = ColorScaleRule(start_type="min", ... start_color="00FF0000", # Red ... end_type="max", ... end_color="0000FF00") # Green >>> # Again, let's add this gradient to the star ratings, column "H" >>> sheet.conditional_formatting.add("H2:H100", color_scale_rule) >>> workbook.save(filename="sample_conditional_formatting_color_scale.xlsx") Now you should see a color gradient on column H, from red to green, according to the star rating: You can also add a third color and make two gradients instead: >>> >>> from openpyxl.formatting.rule import ColorScaleRule >>> color_scale_rule = ColorScaleRule(start_type="num", ... start_value=1, ... start_color="00FF0000", # Red ... mid_type="num", ... mid_value=3, ... mid_color="00FFFF00", # Yellow ... end_type="num", ... end_value=5, ... end_color="0000FF00") # Green >>> # Again, let's add this gradient to the star ratings, column "H" >>> sheet.conditional_formatting.add("H2:H100", color_scale_rule) >>> workbook.save(filename="sample_conditional_formatting_color_scale_3.xlsx") This time, you'll notice that star ratings between 1 and 3 have a gradient from red to yellow, and star ratings between 3 and 5 have a gradient from yellow to green: The IconSet allows you to add an icon to the cell according to its value: >>> >>> from openpyxl.formatting.rule import IconSetRule >>> icon_set_rule = IconSetRule("5Arrows", "num", [1, 2, 3, 4, 5]) >>> sheet.conditional_formatting.add("H2:H100", icon_set_rule) >>> workbook.save("sample_conditional_formatting_icon_set.xlsx") You'll see a colored arrow next to the star rating. This arrow is red and points down when the value of the cell is 1 and, as the rating gets better, the arrow starts pointing up and becomes green: The openpyxl package has a full list of other icons you can use, besides the arrow. Finally, the DataBar allows you to create progress bars: >>> >>> from openpyxl.formatting.rule import DataBarRule >>> data_bar_rule = DataBarRule(start_type="num", ... start_value=1, ... end_type="num", ... end_value="5", ... color="0000FF00") # Green >>> sheet.conditional_formatting.add("H2:H100", data_bar_rule) >>> workbook.save("sample_conditional_formatting_data_bar.xlsx") You'll now see a green progress bar that gets fuller the closer the star rating is to the number 5: As you can see, there are a lot of cool things you can do with conditional formatting. Here, you saw only a few examples of what you can achieve with it, but check the openpyxl documentation to see a bunch of other options.

    Adding Images

    Even though images are not something that you'll often see in a spreadsheet, it's quite cool to be able to add them. Maybe you can use it for branding purposes or to make spreadsheets more personal. To be able to load images to a spreadsheet using openpyxl, you'll have to install Pillow: $ pip install Pillow Apart from that, you'll also need an image. For this example, you can grab the Real Python logo below and convert it from .webp to .png using an online converter such as cloudconvert.com, save the final file as logo.png, and copy it to the root folder where you're running your examples: Afterward, this is the code you need to import that image into the hello_word.xlsx spreadsheet: from openpyxl import load_workbook from openpyxl.drawing.image import Image # Let's use the hello_world spreadsheet since it has less data workbook = load_workbook(filename="hello_world.xlsx") sheet = workbook.active logo = Image("logo.png") # A bit of resizing to not fill the whole spreadsheet with the logo logo.height = 150 logo.width = 150 sheet.add_image(logo, "A3") workbook.save(filename="hello_world_logo.xlsx") You have an image on your spreadsheet! Here it is: The image's left top corner is on the cell you chose, in this case, A3.

    Adding Pretty Charts

    Another powerful thing you can do with spreadsheets is create an incredible variety of charts. Charts are a great way to visualize and understand loads of data quickly. There are a lot of different chart types: bar chart, pie chart, line chart, and so on. openpyxl has support for a lot of them. Here, you'll see only a couple of examples of charts because the theory behind it is the same for every single chart type: Note: A few of the chart types that openpyxl currently doesn't have support for are Funnel, Gantt, Pareto, Treemap, Waterfall, Map, and Sunburst. For any chart you want to build, you'll need to define the chart type: BarChart, LineChart, and so forth, plus the data to be used for the chart, which is called Reference. Before you can build your chart, you need to define what data you want to see represented in it. Sometimes, you can use the dataset as is, but other times you need to massage the data a bit to get additional information. Let's start by building a new workbook with some sample data: 1from openpyxl import Workbook 2from openpyxl.chart import BarChart, Reference 3 4workbook = Workbook() 5sheet = workbook.active 6 7# Let's create some sample sales data 8rows = [ 9 ["Product", "Online", "Store"], 10 [1, 30, 45], 11 [2, 40, 30], 12 [3, 40, 25], 13 [4, 50, 30], 14 [5, 30, 25], 15 [6, 25, 35], 16 [7, 20, 40], 17] 18 19for row in rows: 20 sheet.append(row) Now you're going to start by creating a bar chart that displays the total number of sales per product: 22chart = BarChart() 23data = Reference(worksheet=sheet, 24 min_row=1, 25 max_row=8, 26 min_col=2, 27 max_col=3) 28 29chart.add_data(data, titles_from_data=True) 30sheet.add_chart(chart, "E2") 31 32workbook.save("chart.xlsx") There you have it. Below, you can see a very straightforward bar chart showing the difference between online product sales online and in-store product sales: Like with images, the top left corner of the chart is on the cell you added the chart to. In your case, it was on cell E2. Note: Depending on whether you're using Microsoft Excel or an open-source alternative (LibreOffice or OpenOffice), the chart might look slightly different. Try creating a line chart instead, changing the data a bit: 1import random 2from openpyxl import Workbook 3from openpyxl.chart import LineChart, Reference 4 5workbook = Workbook() 6sheet = workbook.active 7 8# Let's create some sample sales data 9rows = [ 10 ["", "January", "February", "March", "April", 11 "May", "June", "July", "August", "September", 12 "October", "November", "December"], 13 [1, ], 14 [2, ], 15 [3, ], 16] 17 18for row in rows: 19 sheet.append(row) 20 21for row in sheet.iter_rows(min_row=2, 22 max_row=4, 23 min_col=2, 24 max_col=13): 25 for cell in row: 26 cell.value = random.randrange(5, 100) With the above code, you'll be able to generate some random data regarding the sales of 3 different products across a whole year. Once that's done, you can very easily create a line chart with the following code: 28chart = LineChart() 29data = Reference(worksheet=sheet, 30 min_row=2, 31 max_row=4, 32 min_col=1, 33 max_col=13) 34 35chart.add_data(data, from_rows=True, titles_from_data=True) 36sheet.add_chart(chart, "C6") 37 38workbook.save("line_chart.xlsx") Here's the outcome of the above piece of code: One thing to keep in mind here is the fact that you're using from_rows=True when adding the data. This argument makes the chart plot row by row instead of column by column. In your sample data, you see that each product has a row with 12 values (1 column per month). That's why you use from_rows. If you don't pass that argument, by default, the chart tries to plot by column, and you'll get a month-by-month comparison of sales. Another difference that has to do with the above argument change is the fact that our Reference now starts from the first column, min_col=1, instead of the second one. This change is needed because the chart now expects the first column to have the titles. There are a couple of other things you can also change regarding the style of the chart. For example, you can add specific categories to the chart: cats = Reference(worksheet=sheet, min_row=1, max_row=1, min_col=2, max_col=13) chart.set_categories(cats) Add this piece of code before saving the workbook, and you should see the month names appearing instead of numbers: Code-wise, this is a minimal change. But in terms of the readability of the spreadsheet, this makes it much easier for someone to open the spreadsheet and understand the chart straight away. Another thing you can do to improve the chart readability is to add an axis. You can do it using the attributes x_axis and y_axis: chart.x_axis.title = "Months" chart.y_axis.title = "Sales (per unit)" This will generate a spreadsheet like the below one: As you can see, small changes like the above make reading your chart a much easier and quicker task. There is also a way to style your chart by using Excel's default ChartStyle property. In this case, you have to choose a number between 1 and 48. Depending on your choice, the colors of your chart change as well: # You can play with this by choosing any number between 1 and 48 chart.style = 24 With the style selected above, all lines have some shade of orange: There is no clear documentation on what each style number looks like, but this spreadsheet has a few examples of the styles available. Here's the full code used to generate the line chart with categories, axis titles, and style: import random from openpyxl import Workbook from openpyxl.chart import LineChart, Reference workbook = Workbook() sheet = workbook.active # Let's create some sample sales data rows = [ ["", "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"], [1, ], [2, ], [3, ], ] for row in rows: sheet.append(row) for row in sheet.iter_rows(min_row=2, max_row=4, min_col=2, max_col=13): for cell in row: cell.value = random.randrange(5, 100) # Create a LineChart and add the main data chart = LineChart() data = Reference(worksheet=sheet, min_row=2, max_row=4, min_col=1, max_col=13) chart.add_data(data, titles_from_data=True, from_rows=True) # Add categories to the chart cats = Reference(worksheet=sheet, min_row=1, max_row=1, min_col=2, max_col=13) chart.set_categories(cats) # Rename the X and Y Axis chart.x_axis.title = "Months" chart.y_axis.title = "Sales (per unit)" # Apply a specific Style chart.style = 24 # Save! sheet.add_chart(chart, "C6") workbook.save("line_chart.xlsx") There are a lot more chart types and customization you can apply, so be sure to check out the package documentation on this if you need some specific formatting.

    Convert Python Classes to Excel Spreadsheet

    You already saw how to convert an Excel spreadsheet's data into Python classes, but now let's do the opposite. Let's imagine you have a database and are using some Object-Relational Mapping (ORM) to map DB objects into Python classes. Now, you want to export those same objects into a spreadsheet. Let's assume the following data classes to represent the data coming from your database regarding product sales: from dataclasses import dataclass from typing import List @dataclass class Sale: quantity: int @dataclass class Product: id: str name: str sales: List[Sale] Now, let's generate some random data, assuming the above classes are stored in a db_classes.py file: 1import random 2 3# Ignore these for now. You'll use them in a sec ;) 4from openpyxl import Workbook 5from openpyxl.chart import LineChart, Reference 6 7from db_classes import Product, Sale 8 9products = [] 10 11# Let's create 5 products 12for idx in range(1, 6): 13 sales = [] 14 15 # Create 5 months of sales 16 for _ in range(5): 17 sale = Sale(quantity=random.randrange(5, 100)) 18 sales.append(sale) 19 20 product = Product(id=str(idx), 21 name="Product %s" % idx, 22 sales=sales) 23 products.append(product) By running this piece of code, you should get 5 products with 5 months of sales with a random quantity of sales for each month. Now, to convert this into a spreadsheet, you need to iterate over the data and append it to the spreadsheet: 25workbook = Workbook() 26sheet = workbook.active 27 28# Append column names first 29sheet.append(["Product ID", "Product Name", "Month 1", 30 "Month 2", "Month 3", "Month 4", "Month 5"]) 31 32# Append the data 33for product in products: 34 data = [product.id, product.name] 35 for sale in product.sales: 36 data.append(sale.quantity) 37 sheet.append(data) That's it. That should allow you to create a spreadsheet with some data coming from your database. However, why not use some of that cool knowledge you gained recently to add a chart as well to display that data more visually? All right, then you could probably do something like this: 38chart = LineChart() 39data = Reference(worksheet=sheet, 40 min_row=2, 41 max_row=6, 42 min_col=2, 43 max_col=7) 44 45chart.add_data(data, titles_from_data=True, from_rows=True) 46sheet.add_chart(chart, "B8") 47 48cats = Reference(worksheet=sheet, 49 min_row=1, 50 max_row=1, 51 min_col=3, 52 max_col=7) 53chart.set_categories(cats) 54 55chart.x_axis.title = "Months" 56chart.y_axis.title = "Sales (per unit)" 57 58workbook.save(filename="oop_sample.xlsx") Now we're talking! Here's a spreadsheet generated from database objects and with a chart and everything: That's a great way for you to wrap up your new knowledge of charts!

    Bonus: Working With Pandas

    Even though you can use Pandas to handle Excel files, there are few things that you either can't accomplish with Pandas or that you'd be better off just using openpyxl directly. For example, some of the advantages of using openpyxl are the ability to easily customize your spreadsheet with styles, conditional formatting, and such. But guess what, you don't have to worry about picking. In fact, openpyxl has support for both converting data from a Pandas DataFrame into a workbook or the opposite, converting an openpyxl workbook into a Pandas DataFrame. Note: If you're new to Pandas, check our course on Pandas DataFrames beforehand. First things first, remember to install the pandas package: $ pip install pandas Then, let's create a sample DataFrame: 1import pandas as pd 2 3data = { 4 "Product Name": ["Product 1", "Product 2"], 5 "Sales Month 1": [10, 20], 6 "Sales Month 2": [5, 35], 7} 8df = pd.DataFrame(data) Now that you have some data, you can use .dataframe_to_rows() to convert it from a DataFrame into a worksheet: 10from openpyxl import Workbook 11from openpyxl.utils.dataframe import dataframe_to_rows 12 13workbook = Workbook() 14sheet = workbook.active 15 16for row in dataframe_to_rows(df, index=False, header=True): 17 sheet.append(row) 18 19workbook.save("pandas.xlsx") You should see a spreadsheet that looks like this: If you want to add the DataFrame's index, you can change index=True, and it adds each row's index into your spreadsheet. On the other hand, if you want to convert a spreadsheet into a DataFrame, you can also do it in a very straightforward way like so: import pandas as pd from openpyxl import load_workbook workbook = load_workbook(filename="sample.xlsx") sheet = workbook.active values = sheet.values df = pd.DataFrame(values) Alternatively, if you want to add the correct headers and use the review ID as the index, for example, then you can also do it like this instead: import pandas as pd from openpyxl import load_workbook from mapping import REVIEW_ID workbook = load_workbook(filename="sample.xlsx") sheet = workbook.active data = sheet.values # Set the first row as the columns for the DataFrame cols = next(data) data = list(data) # Set the field "review_id" as the indexes for each row idx = [row[REVIEW_ID] for row in data] df = pd.DataFrame(data, index=idx, columns=cols) Using indexes and columns allows you to access data from your DataFrame easily: >>> >>> df.columns Index(['marketplace', 'customer_id', 'review_id', 'product_id', 'product_parent', 'product_title', 'product_category', 'star_rating', 'helpful_votes', 'total_votes', 'vine', 'verified_purchase', 'review_headline', 'review_body', 'review_date'], dtype='object') >>> # Get first 10 reviews' star rating >>> df["star_rating"][:10] R3O9SGZBVQBV76 5 RKH8BNC3L5DLF 5 R2HLE8WKZSU3NL 2 R31U3UH5AZ42LL 5 R2SV659OUJ945Y 4 RA51CP8TR5A2L 5 RB2Q7DLDN6TH6 5 R2RHFJV0UYBK3Y 1 R2Z6JOQ94LFHEP 5 RX27XIIWY5JPB 4 Name: star_rating, dtype: int64 >>> # Grab review with id "R2EQL1V1L6E0C9", using the index >>> df.loc["R2EQL1V1L6E0C9"] marketplace US customer_id 15305006 review_id R2EQL1V1L6E0C9 product_id B004LURNO6 product_parent 892860326 review_headline Five Stars review_body Love it review_date 2015-08-31 Name: R2EQL1V1L6E0C9, dtype: object There you go, whether you want to use openpyxl to prettify your Pandas dataset or use Pandas to do some hardcore algebra, you now know how to switch between both packages.

    run Python script in HTML

    run a Python file using HTML using PHP. Add a PHP file as index.php: <html> <head> <title>Run my Python files</title> <?PHP echo shell_exec("python test.py 'parameter1'"); ?> </head> Passing the parameter to Python Create a Python file as test.py: import sys input=sys.argv[1] print(input) Print the parameter passed by PHP.

    OHLC Charts in Python

    import plotly.graph_objects as go import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv') fig = go.Figure(data=go.Ohlc(x=df['Date'], open=df['AAPL.Open'], high=df['AAPL.High'], low=df['AAPL.Low'], close=df['AAPL.Close'])) fig.show()

    execute a script within the Python interpreter

    exec(open("test.py").read())

    PaddleOCR

    1.先决条件 确保您的计算机上安装了以下必要先决条件: Python(3.6 或更高版本) PaddleOCR 库 其他必要的依赖项(例如 NumPy、pandas 等) 您可以使用以下 pip 命令安装 PaddleOCR: pip install paddleocr 2.设置 PaddleOCR一旦您安装了 Python 和所需的库,我们来设置 PaddleOCR。 您可以使用 PaddleOCR 的预训练模型,这些模型可用于文本检测和识别。 代码概览使用 PaddleOCR 进行文本检测和识别的代码片段包括以下主要组件: 图像预处理: 加载输入图像并执行必要的预处理步骤,例如调整大小或归一化。 文本检测: 使用 PaddleOCR 文本检测模型来定位输入图像中文本区域的边界框。 文本识别: 对于每个检测到的边界框,使用 PaddleOCR 文本识别模型来提取相应的文本。 后处理: 整理检测到的文本和识别结果以进行进一步分析或显示。 3.逐步实现让我们分解代码片段,详细解释每个步骤: 文本检测该代码是一个名为 DecMain 的类的一部分,该类专为使用真实数据进行光学字符识别(OCR)评估而设计。 它使用 PaddleOCR 从图像中提取文本,然后计算指标(如准确率、召回率和字符错误率 [CER])来评估 OCR 系统的性能。 class DecMain: def __init__(self, image_folder_path, label_file_path, output_file): self.image_folder_path = image_folder_path self.label_file_path = label_file_path self.output_file = output_file def run_dec(self): # Check and update the ground truth file CheckAndUpdateGroundTruth(self.label_file_path).check_and_update_ground_truth_file() df = OcrToDf(image_folder=self.image_folder_path, label_file=self.label_file_path, det=True, rec=True, cls=False).ocr_to_df() ground_truth_data = ReadGroundTruthFile(self.label_file_path).read_ground_truth_file() # Get the extracted text as a list of dictionaries (representing the OCR results) ocr_results = df.to_dict(orient="records") # Calculate precision, recall, and CER precision, recall, total_samples = CalculateMetrics(ground_truth_data, ocr_results).calculate_precision_recall() CreateSheet(dataframe=df, precision=precision, recall=recall, total_samples=total_samples, file_name=self.output_file).create_sheet() 让我们分解代码并解释每个部分: class DecMain: def __init__(self, image_folder_path, label_file_path, output_file): self.image_folder_path = image_folder_path self.label_file_path = label_file_path self.output_file = output_file DecMain 类有一个 __init__方法,用以下参数初始化对象: image_folder_path: 用于 OCR 的输入图像所在文件夹的路径。 label_file_path: 包含图像的实际文本内容的真实标签文件的路径。 output_file: 评估结果将保存在的输出文件的文件名。 def run_dec(self): # Check and update the ground truth file CheckAndUpdateGroundTruth(self.label_file_path).check_and_update_ground_truth_file() run_dec方法负责运行 OCR 评估过程。 首先,它使用 CheckAndUpdateGroundTruth 类来检查并更新真实标签文件。 df = OcrToDf(image_folder=self.image_folder_path, label_file=self.label_file_path, det=True, rec=True, cls=False).ocr_to_df() OcrToDf 类用于将 OCR 结果转换为 pandas DataFrame(`df`)。 它接受以下参数: image_folder: 包含 OCR 输入图像的文件夹的路径。 label_file: 真实标签文件的路径。 det=True和 rec=True参数表示 DataFrame 将包含文本检测和识别结果。 ground_truth_data = ReadGroundTruthFile(self.label_file_path).read_ground_truth_file() ReadGroundTruthFile 类用于读取真实标签文件并将其内容加载到 ground_truth_data变量中。 # Get the extracted text as a list of dictionaries (representing the OCR results) ocr_results = df.to_dict(orient="records") 从 DataFrame df 中获取的 OCR 结果转换为字典列表(ocr_results),每个字典代表单个图像的 OCR 结果。 # Calculate precision, recall, and CER precision, recall, total_samples = CalculateMetrics(ground_truth_data, ocr_results).calculate_precision_recall() CalculateMetrics 类用于计算 OCR 评估指标: 准确率、召回率和评估的总样本数。 该类将真实数据和 OCR 结果作为输入。 CreateSheet(dataframe=df, precision=precision, recall=recall, total_samples=total_samples, file_name=self.output_file).create_sheet() CreateSheet 类负责创建输出表格(例如 Excel 或 CSV),其中包含评估指标和 OCR 结果。 它接受 DataFrame `df`、准确率、召回率、总样本数和输出文件名作为输入。 总的来说,DecMain 类提供了一种有条理的方式,使用真实数据和 PaddleOCR 的文本检测和识别功能来评估 OCR 模型的性能。 它计算重要的评估指标,并将结果存储在指定的输出文件中,以供进一步分析。 注意: 真实标签文件的格式要使用 DecMain 类和提供的代码进行 OCR 评估,必须正确格式化真实标签文件。 真实标签文件应采用 JSON 格式,其结构如下所示: image_name.jpg [{"transcription": "215mm 18", "points": [[199, 6], [357, 6], [357, 33], [199, 33]], "difficult": False, "key_cls": "digits"}, {"transcription": "XZE SA", "points": [[15, 6], [140, 6], [140, 36], [15, 36]], "difficult": False, "key_cls": "text"}]真实标签文件应为 JSON 格式。 文件的每一行代表图像的 OCR 真实标签。 每一行包含图像的文件名,后跟 JSON 对象形式的该图像的 OCR 结果。 JSON 对象应具有以下几点: "transcription": 图像的真实文本转录。 "points": 表示图像中文本区域边界框坐标的四个点的列表。 "difficult": 一个布尔值,指示文本区域是否难以识别。 "key_cls": OCR 结果的类别标签,例如 "digits" 或 "text"。 在创建用于准确评估 OCR 模型性能的真实标签文件时,请确保遵循此格式。 文本识别代码定义了一个名为 RecMain 的类,该类旨在使用预训练的 OCR 模型在图像文件夹上运行文本识别(OCR)并生成一个评估 Excel 表格。 class RecMain: def __init__(self, image_folder, rec_file, output_file): self.image_folder = image_folder self.rec_file = rec_file self.output_file = output_file def run_rec(self): image_paths = GetImagePathsFromFolder(self.image_folder, self.rec_file).\get_image_paths_from_folder() ocr_model = LoadRecModel().load_model() results = ProcessImages(ocr=ocr_model, image_paths=image_paths).process_images() ground_truth_data = ConvertTextToDict(self.rec_file).convert_txt_to_dict() model_predictions, ground_truth_texts, image_names, precision, recall, \ overall_model_precision, overall_model_recall, cer_data_list = EvaluateRecModel(results, ground_truth_data).evaluate_model() # Create Excel sheet CreateMetricExcel(image_names, model_predictions, ground_truth_texts, precision, recall, cer_data_list, overall_model_precision, overall_model_recall, self.output_file).create_excel_sheet() 让我们分解代码并解释每个部分: class RecMain: def __init__(self, image_folder, rec_file, output_file): self.image_folder = image_folder self.rec_file = rec_file self.output_file = output_file RecMain类有一个__init__方法,用以下参数初始化对象: image_folder: 包含用于文本识别的输入图像的文件夹路径。 rec_file: 包含图像实际文本内容的地面真实标签文件的路径。 output_file: 保存评估结果的输出Excel表格的文件名。 def run_rec(self): image_paths = GetImagePathsFromFolder(self.image_folder, self.rec_file).get_image_paths_from_folder() run_rec方法负责运行文本识别过程。 它首先使用GetImagePathsFromFolder类来获取指定image_folder内所有图像的图像路径列表。 这一步确保OCR模型将处理给定目录内的所有图像。 ocr_model = LoadRecModel().load_model() LoadRecModel类用于加载用于文本识别的预训练OCR模型。 它可能使用PaddleOCR或其他OCR库来加载模型。 results = ProcessImages(ocr=ocr_model, image_paths=image_paths).process_images() ProcessImages类负责使用加载的OCR模型来处理图像。 它以OCR模型(ocr_model)和图像路径列表(image_paths)作为输入。 ground_truth_data = ConvertTextToDict(self.rec_file).convert_txt_to_dict() ConvertTextToDict类用于读取地面实况标签文件并将其转换为字典格式(ground_truth_data)。 这一转换准备了地面实况数据,以便与OCR模型的预测进行比较。 model_predictions, ground_truth_texts, image_names, precision, recall, \ overall_model_precision, overall_model_recall, cer_data_list = EvaluateRecModel(results, ground_truth_data).evaluate_model() EvaluateRecModel类负责将OCR模型的预测与地面实况数据进行比较,并计算评估指标,如精度、召回率和字符错误率(CER)。 它以OCR模型的预测(results)和地面实况数据(ground_truth_data)作为输入。 # Create Excel sheet CreateMetricExcel(image_names, model_predictions, ground_truth_texts, precision, recall, cer_data_list, overall_model_precision, overall_model_recall, self.output_file).create_excel_sheet() CreateMetricExcel类负责创建包含评估指标和OCR结果的输出Excel表。 它接受各种输入数据,包括图像名称、模型预测、地面实况文本、评估指标和输出文件名(self.output_file)。 总之,RecMain类组织了整个文本识别过程,从加载OCR模型到生成包含详细指标的评估Excel表。 它提供了一种有组织和可重复使用的方法,用于评估OCR模型在给定一组图像上的性能。 注: 地面实况文本文件格式使用RecMain类和提供的代码进行OCR评估时,正确格式化地面实况(GT)文本文件至关重要。 GT文本文件应采用以下格式: image_name.jpg text文件的每一行表示一个图像的GT文本。 每一行包含图像的文件名,后跟一个制表符(\t),然后是该图像的GT文本。 确保GT文本文件包含图像文件夹中指定的所有图像的GT文本条目。 GT文本应与图像中实际文本内容相匹配。 这种格式对于准确评估OCR模型的性能是必需的。 您可以在这里找到源代码: https://github.com/vinodbaste/paddleOCR_rec_dec?source=post_page结论我们探讨了如何使用基于深度学习的PaddleOCR进行文本检测和识别的过程。 我们逐步演示了文本检测和识别的实现。 有了PaddleOCR强大的预训练模型和易于使用的API,对图像执行OCR变得更加容易。

    微信爬虫Wechat_Articles_Spider

    wechat_articles_spider是一个用于爬取微信公众号文章的开源Python工具。 wechat_articles_spider具有以下特点: 自动化爬取: 它能够自动化地从指定的微信公众号中抓取文章数据,省去了手动复制粘贴的繁琐过程。 多线程支持: 该工具支持多线程操作,可以同时处理多个公众号,提高了爬取效率。 高度可定制化: 用户可以根据自己的需求,配置爬取的范围、时间间隔、存储格式等参数,以满足不同的应用场景。 数据持久化: 爬取的文章数据可以方便地保存到本地或数据库中,供后续分析和使用。 安装和使用方法 步骤 1:确保您的系统已安装Python环境,并且具备pip包管理工具。 步骤 2:打开终端或命令提示符,并执行以下命令安装wechat_articles_spider: pip install wechatarticles 步骤 3:安装完成后,您可以通过导入wechat_articles_spider模块来使用该工具: import wechat_articles_spider 示例代码 下面是一个简单的示例代码,演示如何使用wechat_articles_spider来爬取微信公众号文章: import wechat_articles_spider # 创建一个爬虫实例 spider = wechat_articles_spider.WechatSpider() # 设置要爬取的公众号名称 spider.set_official_account("公众号名称") # 设置爬取的文章数量 spider.set_article_count(10) # 开始爬取文章 spider.start() # 获取爬取结果 articles = spider.get_articles() # 打印文章标题和链接 for article in articles: print("标题:", article['title']) print("链接:", article['url']) 应用场景 wechat_articles_spider可以应用于多种场景,包括但不限于: 数据分析和挖掘:通过爬取微信公众号文章,可以获取大量的文本数据,用于进行数据分析、情感分析、关键词提取等任务。 新闻媒体监测:可以用于监测特定公众号的文章更新情况,及时获取相关新闻信息。 优缺点 优点: 简单易用,提供了丰富的功能和配置选项。 高效快速,支持多线程操作,提高了爬取效率。 可定制化,用户可以根据需求自定义爬取范围和参数设置。 缺点: 依赖于微信公众号的网页结构,如果微信公众号的页面结构发生变化,可能需要对代码进行适配。 使用该工具需要遵守相关法律法规和网站的使用规则,避免滥用和侵犯他人权益。

    two independent programs to communicate with each other

    The best way for two independent programs to communicate with each other depends on the specific use case and requirements of the programs. Both reading and writing to a file and using a local TCP connection are common methods for inter-process communication. Reading and writing to a file can be a simple and effective way to share data between programs. However, it may not be the best option for real-time communication or when large amounts of data need to be exchanged frequently. Using a local TCP connection can provide more real-time communication and can handle larger amounts of data. However, it requires more setup and configuration, and may not be necessary for simpler communication needs. Both methods are commonly used in inter-process communication. To use a local TCP connection for communication between two independent programs, you need to follow these general steps: Establish a TCP server in one program: Choose one of the programs to act as the server that will listen for incoming connections. Create a TCP socket in the server program and bind it to a specific port. The port number can be any available port that is not already in use. Here's an example of how to set up a TCP server in Python: python import socket # Create a TCP socket server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Bind the socket to a specific address and port server_address = ('localhost', 5000) # Replace 'localhost' with the server's IP address if needed server_socket.bind(server_address) # Listen for incoming connections server_socket.listen(1) # Accept a client connection client_socket, client_address = server_socket.accept() # Now the server is ready to communicate with the client Connect the TCP client to the server: In the other program, create a TCP socket and connect it to the server's IP address and port. Once the connection is established, the client program can send and receive data to/from the server. Here's an example of how the client program can send and receive data to/from the server using the local TCP connection: python import socket # Create a TCP socket client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Connect to the server server_address = ('localhost', 5000) # Replace 'localhost' with the server's IP address if needed client_socket.connect(server_address) # Send data to the server data_to_send = "Hello, server!" client_socket.sendall(data_to_send.encode()) # Receive data from the server received_data = client_socket.recv(1024).decode() print("Received data from server:", received_data) # Close the connection client_socket.close() In this example, the client program creates a TCP socket, connects to the server's IP address and port, and sends data to the server using the sendall() method after encoding the data as bytes. It then waits to receive a response from the server using the recv() method, specifying the maximum number of bytes to receive (1024 in this case). The received data is decoded from bytes to a string and printed. On the server side, you can use a similar approach to receive data from the client and send a response back. Remember to replace 'localhost' with the appropriate IP address if the server is running on a different machine. Additionally, you can add exception handling to gracefully handle errors during the connection and communication process. Here's an example of how the server can receive data from the client and send a response back using the local TCP connection: python import socket # Create a TCP socket server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Bind the socket to a specific address and port server_address = ('localhost', 5000) # Replace 'localhost' with the server's IP address if needed server_socket.bind(server_address) # Listen for incoming connections server_socket.listen(1) # Accept a client connection client_socket, client_address = server_socket.accept() # Receive data from the client received_data = client_socket.recv(1024).decode() print("Received data from client:", received_data) # Process the received data (e.g., perform calculations, generate a response) # Send a response back to the client response_data = "Hello, client!" client_socket.sendall(response_data.encode()) # Close the connection client_socket.close() server_socket.close() In this example, after accepting the client connection, the server program waits to receive data from the client using the recv() method, specifying the maximum number of bytes to receive (1024 in this case). The received data is then decoded from bytes to a string and processed as needed. In this case, we simply generate a response message. After processing the data and generating a response, the server uses the sendall() method to send the response back to the client. The response data is encoded as bytes before sending. Finally, the server and client sockets are closed to release the resources and terminate the connection. Remember to replace 'localhost' with the appropriate IP address if the server is running on a different machine. Similarly, you can add exception handling to handle errors gracefully during the connection and communication process. 使用Python读取日期和时间 可以使用内置的文件操作功能和日期时间处理模块。 import datetime # 读取闹钟日期和时间的文本文件路径 file_path = "path/to/alarms.txt" # 存储闹钟日期和时间的列表 alarms = [] # 打开文本文件并读取闹钟日期和时间 with open(file_path, "r") as file: for line in file: alarm = line.strip() alarms.append(alarm) # 处理每个闹钟 for alarm in alarms: # 获取当前日期和时间 current_datetime = datetime.datetime.now() alarm_datetime = datetime.datetime.strptime(alarm, "%Y-%m-%d %H:%M:%S") # 计算下一个闹钟日期和时间 if alarm_datetime < current_datetime: next_alarm = alarm_datetime + datetime.timedelta(days=1) else: next_alarm = alarm_datetime # 计算闹钟触发时间间隔(秒) interval = (next_alarm - current_datetime).total_seconds() # 等待时间间隔并触发闹钟 import time time.sleep(interval) print("闹钟日期和时间:", alarm) print("闹钟响铃!") 在这个示例代码中,您需要将 file_path 变量设置为包含闹钟日期和时间的文本文件的路径。 代码将打开文件并逐行读取闹钟的日期和时间,然后将其存储在 alarms 列表中。 每个闹钟的日期和时间应以 "YYYY-MM-DD HH:MM:SS" 的格式存储在文本文件中,每行一个闹钟。 代码处理每个闹钟,计算下一个闹钟日期和时间,并使用 time.sleep 函数等待时间间隔,然后触发闹钟。 在示例代码中,我使用 print 语句来显示闹钟日期和时间以及响铃提醒,您可以根据需要进行调整。

    Python内置数据库:SQLite

    import sqlite3 # 连接到数据库 conn = sqlite3.connect('example.db') # 创建一个游标对象 cursor = conn.cursor() # 执行一个查询 cursor.execute('SELECT SQLITE_VERSION()') # 打印查询结果 data = cursor.fetchone() print("SQLite version:", data) # SQLite version: ('3.40.1',) # 创建表格 # 创建一个名为students的表,包含id、name和age三个字段 cursor.execute('''CREATE TABLE students (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)''') # cursor.execute('''CREATE TABLE stocks # (date text, trans text, symbol text, qty real, price real)''') # 插入数据 # 向students表中插入一条数据 cursor.execute("INSERT INTO students (name, age) VALUES ('张三', 20)") # cursor.execute("INSERT INTO stocks VALUES ('2022-10-28', 'BUY', 'GOOG', 100, 490.1)") # 保存更改 conn.commit() # 查询users表中的所有数据 cursor.execute("SELECT * FROM students") rows = cursor.fetchall() # 打印查询结果 for row in rows: print(row) # 更新users表中id为1的数据的name字段为'李四' cursor.execute("UPDATE students SET name=? WHERE id=?", ('李四', 1)) # 查询users表中的所有数据 cursor.execute("SELECT * FROM students") rows = cursor.fetchall() # 打印查询结果 for row in rows: print(row) # 删除users表中id为1的数据 cursor.execute("DELETE FROM students WHERE id=?", (1,)) # 提交更改并关闭连接 conn.commit() # 关闭连接 conn.close()

    Python SQLite

    To use sqlite3 module, you must first create a connection object that represents the database and then optionally you can create a cursor object, which will help you in executing all the SQL statements.

    Python sqlite3 module APIs


    Following are important sqlite3 module routines, which can suffice your requirement to work with SQLite database from your Python program. If you are looking for a more sophisticated application, then you can look into Python sqlite3 module's official documentation.
    Sr.No.API & Description
    1sqlite3.connect(database [,timeout ,other optional arguments])
    This API opens a connection to the SQLite database file.
    You can use ":memory:" to open a database connection to a database that resides in RAM instead of on disk.
    If database is opened successfully, it returns a connection object.
    When a database is accessed by multiple connections, and one of the processes modifies the database, the SQLite database is locked until that transaction is committed.
    The timeout parameter specifies how long the connection should wait for the lock to go away until raising an exception.
    The default for the timeout parameter is 5.0 (five seconds).
    If the given database name does not exist then this call will create the database.
    You can specify filename with the required path as well if you want to create a database anywhere else except in the current directory.
    2
    connection.cursor([cursorClass])
    This routine creates a cursor which will be used throughout of your database programming with Python.
    This method accepts a single optional parameter cursorClass.
    If supplied, this must be a custom cursor class that extends sqlite3.Cursor.
    3
    cursor.execute(sql [, optional parameters])
    This routine executes an SQL statement.
    The SQL statement may be parameterized (i.e. placeholders instead of SQL literals).
    The sqlite3 module supports two kinds of placeholders: question marks and named placeholders (named style).
    For example − cursor.execute("insert into people values (?, ?)", (who, age))
    4
    connection.execute(sql [, optional parameters])
    This routine is a shortcut of the above execute method provided by the cursor object and it creates an intermediate cursor object by calling the cursor method, then calls the cursor's execute method with the parameters given.
    5
    cursor.executemany(sql, seq_of_parameters)
    This routine executes an SQL command against all parameter sequences or mappings found in the sequence sql.
    6
    connection.executemany(sql[, parameters])
    This routine is a shortcut that creates an intermediate cursor object by calling the cursor method, then calls the cursor.s executemany method with the parameters given.
    7
    cursor.executescript(sql_script)
    This routine executes multiple SQL statements at once provided in the form of script.
    It issues a COMMIT statement first, then executes the SQL script it gets as a parameter.
    All the SQL statements should be separated by a semi colon (;).
    8
    connection.executescript(sql_script)
    This routine is a shortcut that creates an intermediate cursor object by calling the cursor method, then calls the cursor's executescript method with the parameters given.
    9
    connection.total_changes()
    This routine returns the total number of database rows that have been modified, inserted, or deleted since the database connection was opened.
    10
    connection.commit()
    This method commits the current transaction.
    If you don't call this method, anything you did since the last call to commit() is not visible from other database connections.
    11
    connection.rollback()
    This method rolls back any changes to the database since the last call to commit().
    12
    connection.close()
    This method closes the database connection.
    Note that this does not automatically call commit().
    If you just close your database connection without calling commit() first, your changes will be lost!
    13
    cursor.fetchone()
    This method fetches the next row of a query result set, returning a single sequence, or None when no more data is available.
    14
    cursor.fetchmany([size = cursor.arraysize])This routine fetches the next set of rows of a query result, returning a list.
    An empty list is returned when no more rows are available.
    The method tries to fetch as many rows as indicated by the size parameter.
    15
    cursor.fetchall()
    This routine fetches all (remaining) rows of a query result, returning a list.
    An empty list is returned when no rows are available.

    Connect To Database


    Following Python code shows how to connect to an existing database. If the database does not exist, then it will be created and finally a database object will be returned. #!/usr/bin/python import sqlite3 conn = sqlite3.connect('test.db') print "Opened database successfully"; Here, you can also supply database name as the special name :memory: to create a database in RAM. Now, let's run the above program to create our database test.db in the current directory. You can change your path as per your requirement. Keep the above code in sqlite.py file and execute it as shown below. If the database is successfully created, then it will display the following message. $chmod +x sqlite.py $./sqlite.py Open database successfully

    Create a Table


    Following Python program will be used to create a table in the previously created database. #!/usr/bin/python import sqlite3 conn = sqlite3.connect('test.db') print "Opened database successfully"; conn.execute('''CREATE TABLE COMPANY (ID INT PRIMARY KEY NOT NULL, NAME TEXT NOT NULL, AGE INT NOT NULL, ADDRESS CHAR(50), SALARY REAL);''') print "Table created successfully"; conn.close() When the above program is executed, it will create the COMPANY table in your test.db and it will display the following messages − Opened database successfully Table created successfully

    INSERT Operation


    Following Python program shows how to create records in the COMPANY table created in the above example. #!/usr/bin/python import sqlite3 conn = sqlite3.connect('test.db') print "Opened database successfully"; conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \ VALUES (1, 'Paul', 32, 'California', 20000.00 )"); conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \ VALUES (2, 'Allen', 25, 'Texas', 15000.00 )"); conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \ VALUES (3, 'Teddy', 23, 'Norway', 20000.00 )"); conn.execute("INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY) \ VALUES (4, 'Mark', 25, 'Rich-Mond ', 65000.00 )"); conn.commit() print "Records created successfully"; conn.close() When the above program is executed, it will create the given records in the COMPANY table and it will display the following two lines − Opened database successfully Records created successfully

    SELECT Operation


    Following Python program shows how to fetch and display records from the COMPANY table created in the above example. #!/usr/bin/python import sqlite3 conn = sqlite3.connect('test.db') print "Opened database successfully"; cursor = conn.execute("SELECT id, name, address, salary from COMPANY") for row in cursor: print "ID = ", row[0] print "NAME = ", row[1] print "ADDRESS = ", row[2] print "SALARY = ", row[3], "\n" print "Operation done successfully"; conn.close() When the above program is executed, it will produce the following result. Opened database successfully ID = 1 NAME = Paul ADDRESS = California SALARY = 20000.0 ID = 2 NAME = Allen ADDRESS = Texas SALARY = 15000.0 ID = 3 NAME = Teddy ADDRESS = Norway SALARY = 20000.0 ID = 4 NAME = Mark ADDRESS = Rich-Mond SALARY = 65000.0 Operation done successfully

    UPDATE Operation


    Following Python code shows how to use UPDATE statement to update any record and then fetch and display the updated records from the COMPANY table. #!/usr/bin/python import sqlite3 conn = sqlite3.connect('test.db') print "Opened database successfully"; conn.execute("UPDATE COMPANY set SALARY = 25000.00 where ID = 1") conn.commit() print "Total number of rows updated :", conn.total_changes cursor = conn.execute("SELECT id, name, address, salary from COMPANY") for row in cursor: print "ID = ", row[0] print "NAME = ", row[1] print "ADDRESS = ", row[2] print "SALARY = ", row[3], "\n" print "Operation done successfully"; conn.close() When the above program is executed, it will produce the following result. Opened database successfully Total number of rows updated : 1 ID = 1 NAME = Paul ADDRESS = California SALARY = 25000.0 ID = 2 NAME = Allen ADDRESS = Texas SALARY = 15000.0 ID = 3 NAME = Teddy ADDRESS = Norway SALARY = 20000.0 ID = 4 NAME = Mark ADDRESS = Rich-Mond SALARY = 65000.0 Operation done successfully

    DELETE Operation


    Following Python code shows how to use DELETE statement to delete any record and then fetch and display the remaining records from the COMPANY table. #!/usr/bin/python import sqlite3 conn = sqlite3.connect('test.db') print "Opened database successfully"; conn.execute("DELETE from COMPANY where ID = 2;") conn.commit() print "Total number of rows deleted :", conn.total_changes cursor = conn.execute("SELECT id, name, address, salary from COMPANY") for row in cursor: print "ID = ", row[0] print "NAME = ", row[1] print "ADDRESS = ", row[2] print "SALARY = ", row[3], "\n" print "Operation done successfully"; conn.close() When the above program is executed, it will produce the following result. Opened database successfully Total number of rows deleted : 1 ID = 1 NAME = Paul ADDRESS = California SALARY = 20000.0 ID = 3 NAME = Teddy ADDRESS = Norway SALARY = 20000.0 ID = 4 NAME = Mark ADDRESS = Rich-Mond SALARY = 65000.0 Operation done successfully

    AI can provide valuable assistance in learning programming

    specifically Python, in the following ways: Interactive Learning Platforms: AI-powered platforms can offer interactive lessons and tutorials for learning Python. These platforms can provide step-by-step instructions, coding challenges, and interactive coding environments where learners can practice writing and executing Python code. AI algorithms can analyze learners' code and provide immediate feedback, helping them identify and correct errors. Intelligent Code Autocompletion: AI-based code editors and integrated development environments (IDEs) can offer intelligent code autocompletion suggestions while programming in Python. These suggestions are based on context, syntax, and common programming patterns. AI-powered autocompletion can help learners explore different options, reduce syntax errors, and improve coding efficiency. Error Detection and Debugging: AI can assist in detecting and debugging errors in Python code. By analyzing code syntax, structure, and runtime behavior, AI algorithms can identify potential errors, offer suggestions for correction, and provide explanations for common mistakes. This helps learners understand and resolve coding issues more effectively. Code Generation and Examples: AI can generate Python code snippets or complete functions based on specified requirements or desired outcomes. This can be particularly helpful for beginners who are learning the language and need assistance with writing correct and functional code. AI can also provide real-life examples of Python code usage in various applications and domains. Natural Language Processing (NLP): AI-powered NLP capabilities can aid in understanding Python documentation, tutorials, and forums. NLP algorithms can analyze and interpret text-based resources, extract relevant information, and provide explanations in a more accessible and understandable format. This can assist learners in comprehending complex programming concepts and syntax. Intelligent Recommendations: AI algorithms can recommend relevant learning resources, tutorials, and projects based on learners' proficiency level, interests, and areas of improvement. These recommendations can help learners discover additional learning materials, practice Python in different contexts, and explore advanced topics at their own pace. Collaborative Learning and Coding Communities: AI can facilitate collaborative learning and coding communities by connecting learners with peers, mentors, and experts in Python programming. AI-powered platforms can match learners with similar interests or skill levels for group projects, coding challenges, and code reviews. This fosters an environment of peer support, knowledge sharing, and collective learning. AI-based Python Libraries and Frameworks: AI libraries and frameworks like TensorFlow, PyTorch, and scikit-learn provide powerful tools for developing AI and machine learning applications in Python. Learning these libraries and frameworks can open up opportunities to explore and apply AI techniques within Python programming. It's important to note that while AI can assist in learning Python, hands-on practice, active problem-solving, and engagement with programming exercises and projects remain crucial for developing programming skills. AI serves as a supportive tool to enhance the learning experience, but it should not replace practical coding experience and conceptual understanding. Here are a few interactive learning platforms that can help you learn Python programming: Codecademy (www.codecademy.com): Codecademy offers interactive Python courses that guide learners through coding exercises, projects, and quizzes. The platform provides a hands-on learning experience and covers topics ranging from Python basics to advanced concepts. Coursera (www.coursera.org): Coursera hosts a variety of Python programming courses offered by universities and institutions worldwide. These courses often include interactive coding exercises, video lectures, and assignments to reinforce learning. DataCamp (www.datacamp.com): DataCamp specializes in data science and offers interactive Python courses focused on data analysis, visualization, and machine learning. The platform provides a learn-by-doing approach with coding exercises and real-world projects. edX (www.edx.org): edX offers Python courses from renowned universities and institutions. These courses cover Python fundamentals, web development, data science, and more. The platform provides interactive coding exercises and assessments to test your knowledge. SoloLearn (www.sololearn.com): SoloLearn offers a mobile app and web platform with interactive Python courses. The courses are designed in a gamified format, allowing learners to earn points, compete with peers, and practice coding challenges. Codewars (www.codewars.com): Codewars provides a platform for users to solve coding challenges in various programming languages, including Python. You can choose Python-specific challenges of different difficulty levels and learn from community solutions. JetBrains Academy (www.jetbrains.com/academy): JetBrains Academy offers an interactive learning platform with Python courses and projects. The platform provides an integrated development environment (IDE) and offers step-by-step guidance for learning Python and building real-world applications. Remember, while these platforms provide interactive learning experiences, it\'s important to practice coding regularly, work on projects, and engage in problem-solving to solidify your Python programming skills.

    create new project process

    create dir cd dir git init git remote add origin git@github.com:$USERNAME/$1.git touch README.md git add . git commit -m "Initial commit" git push -u origin master code .

    Build Android Apps with Flet in Python

    Build Android Apps with Flet in Python (APKs)

    Pygubu: 快速开发Python tkinter用户界面

    pip install pygubu-designer Pygubu is a RAD tool to enable quick and easy development of user interfaces for the Python's tkinter module. The user interfaces designed are saved as XML files, and, by using the pygubu builder, these can be loaded by applications dynamically as needed. https://github.com/alejandroautalan/pygubu-designer Usage Type on the terminal the following commands. C:\Python3\Scripts\pygubu-designer.exe Where C:\Python3 is the path to your Python installation directory.

    wechat

    Python 微信自动化操作 用 Python 发送通知到微信 用python控制微信 Python操作微信

    Could not install packages due to an OSError:

    [WinError 2] No such file or directory The system cannot find the file specified: 'C:\\Python311\\Scripts\\chardetect.exe' -> 'C:\\Python311\\Scripts\\chardetect.exe.deleteme' Try running the command as administrator: or pip install numpy --user to install numpy without any special previlages

    数据分析-主成分分析 (PCA)





    主成分分析(Principal Component Analysis, PCA)是一种用于数据降维的统计技术。 它的目的是通过将原始数据转化为一组新的不相关的变量(称为主成分),来减少数据的维度,同时保留数据中最重要的信息。

    PCA 的具体步骤


    标准化数据: 由于不同特征可能具有不同的单位和量级,因此在进行PCA之前,需要对数据进行标准化处理(通常是零均值和单位方差)。 计算协方差矩阵: 协方差矩阵用于测量数据集中的特征之间的线性相关性。 计算特征值和特征向量: 协方差矩阵的特征值和特征向量用于确定主成分。 特征值表示主成分解释的数据方差的大小,特征向量则表示主成分的方向。 选择主成分: 根据特征值的大小排序,选择前k个特征值对应的特征向量作为主成分。 变换数据: 将原始数据投影到选择的主成分上,得到降维后的数据集。

    PCA 适用于以下场景


    降维:在高维数据集中减少维度,同时保留数据中最重要的信息。 去除噪音:通过去除解释较少方差的次要成分,减少数据中的噪音。 数据可视化:将高维数据投影到二维或三维空间中,以便可视化。 特征选择:识别和选择对数据变异贡献最大的特征。

    优点


    降维有效:能够有效地减少数据的维度,保留主要信息。 去除冗余:去除特征间的多重共线性,简化模型。 提高计算效率:降低数据维度,提高后续算法的计算效率。

    缺点


    解释性差:主成分是线性组合,不容易解释原始特征的实际意义。 信息丢失:降维过程中可能会丢失一些信息,尤其是当保留的主成分数量较少时。 线性假设:假设特征间的关系是线性的,对于非线性关系效果较差。

    注意事项


    数据标准化:在进行 PCA 之前,一定要对数据进行标准化处理。 特征选择:选择合适的主成分数量,通常通过累计解释方差来确定。 异常值:注意数据中的异常值,因为它们可能会对 PCA 结果产生较大影响。

    如何找到最优值


    确定最优主成分数量通常通过累计解释方差的方法来进行。 具体步骤如下: 计算每个主成分的解释方差计算累计解释方差,并绘制累计解释方差图(即 Scree Plot)。 选择拐点:通常选择累计解释方差达到 90% 或解释方差显著下降的拐点对应的主成分数量。

    实际数据演示


    我们将使用Python模拟一个电商数据集,包含多个维度的数据,然后通过 PCA 进行降维处理,并找出最优解。 首先,我们生成模拟的电商数据,并对其进行标准化处理↓ import numpy as np import pandas as pd from sklearn.preprocessing import StandardScaler np.random.seed(0) data = { 'customer_id': np.arange(1, 101), 'age': np.random.randint(18, 70, size=100), 'annual_income': np.random.randint(20000, 150000, size=100), 'spending_score': np.random.randint(1, 100, size=100), 'years_as_customer': np.random.randint(1, 10, size=100), 'total_purchases': np.random.randint(1, 50, size=100), 'average_purchase_value': np.random.randint(10, 1000, size=100), 'purchase_frequency': np.random.randint(1, 12, size=100) } df = pd.DataFrame(data) features = df.drop('customer_id', axis=1) scaler = StandardScaler() scaled_features = scaler.fit_transform(features) 接下来,我们对标准化后的数据进行 PCA 分析,并计算累计解释方差。 from sklearn.decomposition import PCA import matplotlib.pyplot as plt pca = PCA() pca_features = pca.fit_transform(scaled_features) # 计算累计解释方差 explained_variance = np.cumsum(pca.explained_variance_ratio_) # 绘制 Scree Plot plt.figure(figsize=(10, 6)) plt.plot(range(1, len(explained_variance) + 1), explained_variance, marker='o', linestyle='--') plt.title('Scree Plot') plt.xlabel('Number of Components') plt.ylabel('Cumulative Explained Variance') plt.axhline(y=0.9, color='r', linestyle='--') plt.text(1, 0.85, '90% cut-off threshold', color='red', fontsize=12) plt.show() print("Explained Variance Ratios:", pca.explained_variance_ratio_) print("Cumulative Explained Variance:", explained_variance) 根据累计解释方差图,我们选择前3个主成分作为最终的降维结果↓ # 选择前3个主成分 pca = PCA(n_components=3) pca_features = pca.fit_transform(scaled_features) # 打印每个主成分的解释方差 print("Explained Variance Ratios for 3 Components:", pca.explained_variance_ratio_) 我们可以通过散点图来可视化前两个主成分。 # 可视化前两个主成分 plt.figure(figsize=(10, 6)) sns.scatterplot(x=pca_features[:, 0], y=pca_features[:, 1], hue=df['spending_score'], palette='viridis') plt.title('PCA - 可视化前两个主成分') plt.xlabel('主成分-1') plt.ylabel('主成分-2') plt.colorbar(label='Spending Score') plt.show() 最后,我们对新数据进行标准化处理,并通过 PCA 转换进行降维。 # 新数据预测 new_data = np.array([[25, 45000, 75, 3, 20, 250, 6]]) new_data_scaled = scaler.transform(new_data) new_data_pca = pca.transform(new_data_scaled) print(f"New data PCA transformation: {new_data_pca}") 通过上述步骤,我们使用 PCA 对高维电商数据进行了降维处理。 我们详细解释了每一步的过程,并通过累计解释方差图确定了最佳的主成分数量,最后对降维后的数据进行了可视化处理。 同时,我们演示了如何对新数据进行 PCA 转换,以便进行预测。

    Download Video in MP3 format using PyTube

    Pytube is a lightweight, Python-written library. Pytube provides a command-line feature that allows you to stream videos directly from the terminal easily. To import pytube, we can use the commands according to the python version. For Python2 : pip install pytube For Python3 : pip3 install pytube For pyube3 : pip install pytube3 To save the audio file, we are using the os module: pip install os_sys Procedure: First, we need to import the required (pytube and os) module. Implementation: Python3 # importing packages from pytube import YouTube import os # url input from user yt = YouTube( str(input("Enter the URL of the video you want to download: \n>> "))) # extract only audio video = yt.streams.filter(only_audio=True).first() # check for destination to save file print("Enter the destination (leave blank for current directory)") destination = str(input(">> ")) or '.' # download the file out_file = video.download(output_path=destination) # save the file base, ext = os.path.splitext(out_file) new_file = base + '.mp3' os.rename(out_file, new_file) # result of success print(yt.title + " has been successfully downloaded.")

    Cut MP3 file

    Before we go forward, we need to install FFmpeg in your system as it is required to deal with mp3 files to download you can visit this site: https://phoenixnap.com/kb/ffmpeg-windows. Also, we will use pydub library to perform this task. pip install pydub Step 1: Open an mp3 file using pydub. from pydub import AudioSegment song = AudioSegment.from_mp3("test.mp3") Step 2: Slice audio # pydub does things in milliseconds ten_seconds = 10 * 1000 first_10_seconds = song[:ten_seconds] last_5_seconds = song[-5000:] Step 3: Save the results as a new file in mp3 audio format. first_10_seconds.export("new.mp3", format="mp3") Example: from pydub import AudioSegment # Open an mp3 file song = AudioSegment.from_file("testing.mp3", format="mp3") # pydub does things in milliseconds ten_seconds = 10 * 1000 # song clip of 10 seconds from starting first_10_seconds = song[:ten_seconds] # save file first_10_seconds.export("first_10_seconds.mp3", format="mp3") print("New Audio file is created and saved")

    用Python开发音乐爬虫



    首先准备

    环境 Python 3.10 Pycharm 模块 import requests >>> pip install requests import parsel >>> pip install parsel import prettytable >>> pip install prettytable import os 打包exe程序: pyinstaller > pip install pyinstaller 文章看的不理解,我还录制了详细的视频讲解,和源码一起打包好了,私信小编,发送关键字【学习】自动掉落

    爬虫实现基本流程

    案例分为三部分: 单首歌曲采集 搜索下载功能 (单个/批量) 把py程序打包成exe软件

    一、数据来源分析

     1、明确需求
    明确采集的网站以及数据内容 网址: https://www.gequbao.com/music/402856 数据: 歌曲链接
     2.抓包分析
    通过浏览器开发者工具分析对应的数据位置 打开开发者工具 F12 / 右键点击检查选择network 网络 刷新网页 通过关键字搜索找到对应数据位置 先找歌曲链接地址(播放地址): 开发者工具 > 网络 > 媒体 > 查看对应歌曲链接 再根据链接中一段参数进行搜索 关键字: 需要什么数据就搜什么数据 数据包地址: https://www.gequbao.com/api/play_url?id=402856&json=1

    二、代码实现步骤

     1.发送请求
    模拟浏览器对于url地址发送请求 # 导入数据请求模块 import requests """发送请求""" # 模拟浏览器 (请求头) headers = { # User-Agent 用户代理, 表示浏览器基本身份信息 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36' } # 请求网址 url = 'https://www.gequbao.com/api/play_url?id=402856&json=1' # 发送请求 response = requests.get(url=url, headers=headers)
     2.获取数据
    获取服务器返回响应数据 # 获取响应json数据 json_data = response.json()
     3.解析数据
    提取我们需要的数据内容 css选择器简单使用 根据标签属性提取数据内容 查看数据对应标签位置
     4.保存数据
    获取歌曲内容, 保存到本地文件夹 # 对于歌曲链接发送请求, 获取歌曲内容 music_content = requests.get(url=play_url, headers=headers).content # 数据保存 with open(f'music\\{download_title}-{download_singer}.mp3', mode='wb') as f: # 写入数据 f.write(music_content) print(f'{download_title}歌曲下载完成!')
     5.搜索下载
    找搜索接口 歌名 歌曲ID 分析不同歌曲, 数据包有什么变化 歌曲变化 > ID 只要过去所有歌曲ID你就可以采集所有歌曲内容 效果展示
     6.打包EXE
    pyinstaller -F xx.py 打包成功后,就能分享给其它不会py的小伙伴,愉快的使用了

    自动化地完成各种任务 Python Scripts

    项目地址:https://github.com/DhanushNehru/Python-Scripts 部分脚本介绍 •Arrange It:根据文件扩展名自动将文件移动到相应的文件夹。 •Auto WiFi Check:监控 WiFi 连接是否正常。 •AutoCert:批量生成电子证书。 •Automated Emails:读取 CSV 文件,发送个性化的邮件。 •Black Hat Python:来自《黑帽子 Python》一书的源代码。 •Blackjack:一个二十一点游戏。 •Chessboard:使用 matplotlib 创建一个棋盘。 •Compound Interest Calculator:一个计算复利的 Python 脚本。 •Countdown Timer:当输入的时间过去时显示一条消息。 •Convert Temperature:一个 Python 脚本,用于在华氏度、摄氏度和开氏度之间转换温度。 •Crop Images:一个 Python 脚本,用于裁剪给定的图像。 •CSV to Excel:一个 Python 脚本,用于将 CSV 文件转换为 Excel 文件。 •Currency Script:一个 Python 脚本,用于将一个国家的货币转换为另一个国家的货币。 •Digital Clock:一个 Python 脚本,用于在终端中显示一个数字时钟。 •Display Popup Window:一个 Python 脚本,用于向用户预览一个 GUI 界面。 •Duplicate Finder:该脚本通过 MD5 哈希识别重复文件,并允许删除或重新定位文件。 •Emoji in PDF:一个 Python 脚本,用于在 PDF 中查看 Emoji。 •Expense Tracker:一个 Python 脚本,可以跟踪开支。 •Face Reaction:一个试图检测面部表情的脚本。 •Fake Profiles:创建虚假配置文件。 •File Encryption Decryption:使用 AES 算法对文件进行加密和解密,以确保安全性。 •Font Art:使用 Python 显示字体艺术。 •Freelance Helper Program:从包含工作时间的 Excel 文件中获取数据,并计算报酬。 •Get Hexcodes From Websites:从网站生成包含十六进制代码的 Python 列表。 •Hand_Volume:检测和跟踪手部动作,以控制音量。 •Harvest Predictor:接收一些必要的输入参数,并根据这些参数预测收成。 •Html-to-images:将 HTML 文档转换为图像文件。 •Image Capture:从网络摄像头捕捉图像并将其保存到本地设备。 •Image Compress:压缩图像。 •Image Manipulation without libraries:在不使用任何外部库的情况下,操作图像。 •Image Text:从图像中提取文本。 •Image Text to PDF:将图像和文本添加到 PDF 文件中。 •Image Watermarker:给图像添加水印。 •Image to ASCII:将图像转换为 ASCII 艺术。 •Image to Gif:从图像生成 GIF 文件。 •IP Geolocator:使用 IP 地址在地球上定位位置。 •Jokes Generator:生成笑话。 •JSON to CSV 1:将 JSON 转换为 CSV 文件。 •JSON to CSV 2:将 JSON 文件转换为 CSV 文件。 •JSON to CSV converter:将 JSON 文件转换为 CSV 文件。它还可以转换嵌套的 JSON 文件。示例 JSON 用于测试。 •JSON to YAML converter:将 JSON 文件转换为 YAML 文件。示例 JSON 用于测试。 •Keylogger:一个可以跟踪你的击键、剪贴板文本、定期截屏,并录制音频的键盘记录器。 •Keyword - Retweeting:查找包含给定关键字的最新推文,然后转发它们。 •LinkedIn Bot:自动搜索 LinkedIn 上的公开资料,并将数据导出到 Excel 表格。 •Mail Sender:发送电子邮件。 •Merge Two Images:水平或垂直合并两个图像。 •Mouse mover:每 15 秒移动一次鼠标。 •No Screensaver:防止屏幕保护程序开启。 •OTP Verification:一个 OTP 验证检查器。 •Password Generator:生成随机密码。 •Password Manager:生成和管理密码管理器。 •PDF to Audio:将 PDF 转换为音频。 •Planet Simulation:模拟多个行星绕太阳旋转。 •Playlist Exchange:一个 Python 脚本,用于在 Spotify 和 Python 之间交换歌曲和播放列表。 •PNG TO JPG CONVERTOR:一个 PNG 到 JPG 图片转换器。 •QR Code Generator:从提供的链接生成二维码。 •Random Color Generator:一个随机颜色生成器,会显示颜色和值! •Remove Background:删除图像的背景。 •Rock Paper Scissor 1:一个石头剪刀布游戏。 •Rock Paper Scissor 2:一个新的石头剪刀布游戏。 •Run Then Notify:运行一个缓慢的命令,并在执行完成后发送电子邮件通知。 •Selfie with Python:用 Python 拍照。 •Simple TCP Chat Server:在你的 LAN 上创建一个本地服务器,用于接收和发送消息! •Snake Water Gun:一个类似石头剪刀布的游戏。 •Sorting:冒泡排序算法。 •Star Pattern:创建一个星形图案金字塔。 •Take a break:在长时间工作时休息的 Python 代码。 •Text Recognition:一个图像文本识别 ML 模型,用于从图像中提取文本。 •Text to Image:一个 Python 脚本,它将你的文本转换为 JPEG 图片。 •Tic Tac Toe 1:一个井字棋游戏。 •Tik Tac Toe 2:一个井字棋游戏。 •Turtle Art & Patterns:用于查看海龟艺术的脚本,也有一些基于提示的脚本。 •Turtle Graphics:使用海龟图形的代码。 •Twitter Selenium Bot:一个可以与 Twitter 以多种方式交互的机器人。 •Umbrella Reminder:雨伞提醒。 •URL Shortener:一个 URL 缩短代码,将长 URL 压缩为更短、更容易管理的链接。 •Video Downloader:从 YouTube 下载视频到你的本地系统。 •Video Watermarker:给任何你选择的视频添加水印。 •Virtual Painter:虚拟绘画应用程序。 •Wallpaper Changer:自动更改主屏幕壁纸,并在上面添加随机引语和股票行情。 •Weather GUI:显示天气信息。 •Website Blocker:下载网站并在你的本地 IP 上的首页加载它。 •Website Cloner:克隆任何网站并在你的本地 IP 上打开它。 •Weight Converter:一个简单的 GUI 脚本,用于将重量转换为不同的计量单位。 •Wikipedia Data Extractor:一个简单的维基百科数据提取脚本,可以在你的 IDE 中获得输出。 •Word to PDF:一个 Python 脚本,用于将 MS Word 文件转换为 PDF 文件。 •Youtube Downloader:从 YouTube 下载任何视频,可以是视频格式或音频格式! •Pigeonhole Sort:算法,使用鸽巢排序算法来有效地排序数组! •Youtube Playlist Info Scraper:这个 python 模块使用播放列表链接检索 YouTube 播放列表的 JSON 格式信息。 •Gitpod:使用云端免费开发环境,可以直接开始编码。

    Python CheatSheet


    Basics
    ,  Showing Output To User,  Taking Input From the User,  range Function,
    Comments
    ,  Single line comment,  Multi-line comment,
    Escape Sequence
    ,  Newline,  Backslash,  Single Quote,  Tab,  Backspace,  Octal value,  Hex value,  Carriage Return,
    Strings
    ,  String,  Indexing,  Slicing,  isalnum() method,  isalpha() method,  isdecimal() method,  isdigit() method,  islower() method,  isspace() method,  isupper() method,  lower() method,  upper() method,  strip() method,
    List
    ,  Indexing,  Empty List,  index method,  append method,  extend method,  insert method,  pop method,  remove method,  clear method,  count method,  reverse method,  sort method,
    Tuples
    ,  Tuple Creation,  Indexing,  count method,  index method,
    Sets
    ,  Set Creation: Way 1,  Set Creation: Way 2,  Set Methods,  add() method,  clear() method,  discard() method,  intersection() method,  issubset() method,  pop() method,  remove() method,  union() method,
    Dictionaries
    ,  Dictionary,  Empty Dictionary,  Adding Element to a dictionary,  Updating Element in a dictionary,  Deleting an element from a dictionary,  Dictionary Functions & Methods,  len() method,  clear() method,  get() method,  items() method,  keys() method,  values() method,  update() method,
    Indentation
    ,
    Conditional Statements
    ,  if Statement,  if-else Statement,  if-elif Statement,  Nested if-else Statement,
    Loops in Python
    ,  For Loop,  While Loop,  Break Statement,  Continue Statement,
    Functions
    ,  Function Definition,  Function Call,  Return statement in Python function,  Arguments in python function,
    File Handling
    ,  open() function,  modes-,  close() function,  read () function,  write function,
    Exception Handling
    ,  try and except,  else,
    finally
    ,
    Object Oriented Programming (OOPS)
    ,  class,  Creating an object,  self parameter,  class with a constructor,  Inheritance in python,  Types of inheritance-,  filter function,  issubclass function,
    Iterators and Generators
    ,  Iterator,  Generator,
    Decorators
    ,  property Decorator (getter),  setter Decorator,  deleter Decorator,


    Basics

    Basic syntax from the python programming language

     Showing Output To User

    print("Content that you wanna print on screen") var1 = "Shruti" print("Hi my name is: ",var1)

     Taking Input From the User

    var1 = input("Enter your name: ") print("My name is: ", var1) To take input as an integer: var1=int(input("enter the integer value")) print(var1) To take input as an float: var1=float(input("enter the float value")) print(var1)

     range Function

    range function returns a sequence of numbers, eg, numbers starting from 0 to n-1 for range(0, n) range(int_start_value,int_stop_value,int_step_value) Here the start value and step value are by default 1 if not mentioned by the programmer. but int_stop_value is the compulsory parameter in range function example Display all even numbers between 1 to 100 for i in range(0,101,2): print(i)


    Comments

    Comments are used to make the code more understandable for programmers, and they are not executed by compiler or interpreter.

     Single line comment

    # This is a single line comment

     Multi-line comment

    '''This is a multi-line comment'''


    Escape Sequence

    An escape sequence is a sequence of characters; it doesn't represent itself (but is translated into another character) when used inside string literal or character. Some of the escape sequence characters are as follows:

     Newline

    Newline Character print("\n")

     Backslash

    It adds a backslash print("\\")

     Single Quote

    It adds a single quotation mark print("\'")

     Tab

    It gives a tab space print("\t")

     Backspace

    It adds a backspace print("\b")

     Octal value

    It represents the value of an octal number print("\ooo")

     Hex value

    It represents the value of a hex number print("\xhh")

     Carriage Return

    Carriage return or \r will just work as you have shifted your cursor to the beginning of the string or line. pint("\r")


    Strings

    Python string is a sequence of characters, and each character can be individually accessed using its index.

     String

    You can create Strings by enclosing text in both forms of quotes - single quotes or double quotes. variable_name = "String Data" example str="Shruti" print("string is ",str)

     Indexing

    The position of every character placed in the string starts from 0th position ans step by step it ends at length-1 position

     Slicing

    Slicing refers to obtaining a sub-string from the given string. The following code will include index 1, 2, 3, and 4 for the variable named var_name Slicing of the string can be obtained by the following syntax string_var[int_start_value:int_stop_value:int_step_value] var_name[1 : 5] here start and step value are considered 0 and 1 respectively if not mentioned by the programmmer

     isalnum() method

    Returns True if all the characters in the string are alphanumeric, else False string_variable.isalnum()

     isalpha() method

    Returns True if all the characters in the string are alphabets string_variable.isalpha()

     isdecimal() method

    Returns True if all the characters in the string are decimals string_variable.isdecimal()

     isdigit() method

    Returns True if all the characters in the string are digits string_variable.isdigit()

     islower() method

    Returns True if all characters in the string are lower case string_variable.islower()

     isspace() method

    Returns True if all characters in the string are whitespaces string_variable.isspace()

     isupper() method

    Returns True if all characters in the string are upper case string_variable.isupper()

     lower() method

    Converts a string into lower case equivalent string_variable.lower()

     upper() method

    Converts a string into upper case equivalent string_variable.upper()

     strip() method

    It removes leading and trailing spaces in the string string_variable.strip()


    List

    A List in Python represents a list of comma-separated values of any data type between square brackets. var_name = [element1, element2, ...] These elements can be of different datatypes

     Indexing

    The position of every elements placed in the string starts from 0th position ans step by step it ends at length-1 position List is ordered,indexed,mutable and most flexible and dynamic collection of elements in python.

     Empty List

    This method allows you to create an empty list my_list = []

     index method

    Returns the index of the first element with the specified value list.index(element)

     append method

    Adds an element at the end of the list list.append(element)

     extend method

    Add the elements of a given list (or any iterable) to the end of the current list list.extend(iterable)

     insert method

    Adds an element at the specified position list.insert(position, element)

     pop method

    Removes the element at the specified position and returns it list.pop(position)

     remove method

    The remove() method removes the first occurrence of a given item from the list list.remove(element)

     clear method

    Removes all the elements from the list list.clear()

     count method

    Returns the number of elements with the specified value list.count(value)

     reverse method

    Reverses the order of the list list.reverse()

     sort method

    Sorts the list list.sort(reverse=True|False)


    Tuples

    Tuples are represented as comma-separated values of any data type within parentheses.

     Tuple Creation

    variable_name = (element1, element2, ...) These elements can be of different datatypes

     Indexing

    The position of every elements placed in the string starts from 0th position ans step by step it ends at length-1 position Tuples are ordered,indexing,immutable and most secured collection of elements Lets talk about some of the tuple methods:

     count method

    It returns the number of times a specified value occurs in a tuple tuple.count(value)

     index method

    It searches the tuple for a specified value and returns the position. tuple.index(value)


    Sets

    A set is a collection of multiple values which is both unordered and unindexed. It is written in curly brackets.

     Set Creation: Way 1

    var_name = {element1, element2, ...}

     Set Creation: Way 2

    var_name = set([element1, element2, ...]) Set is unordered,immutable,non-indexed type of collection.Duplicate elements are not allowed in sets.

     Set Methods

    Lets talk about some of the methods of sets:

     add() method

    Adds an element to a set set.add(element)

     clear() method

    Remove all elements from a set set.clear()

     discard() method

    Removes the specified item from the set set.discard(value)

     intersection() method

    Returns intersection of two or more sets set.intersection(set1, set2 ... etc)

     issubset() method

    Checks if a set is a subset of another set set.issubset(set)

     pop() method

    Removes an element from the set set.pop()

     remove() method

    Removes the specified element from the set set.remove(item)

     union() method

    Returns the union of two or more sets set.union(set1, set2...)


    Dictionaries

    The dictionary is an unordered set of comma-separated key:value pairs, within {}, with the requirement that within a dictionary, no two keys can be the same.

     Dictionary

    <dictionary-name> = {<key>: value, <key>: value ...} Dictionary is ordered and mutable collection of elements.Dictionary allows duplicate values but not duplicate keys.

     Empty Dictionary

    By putting two curly braces, you can create a blank dictionary mydict={}

     Adding Element to a dictionary

    By this method, one can add new elements to the dictionary <dictionary>[<key>] = <value>

     Updating Element in a dictionary

    If a specified key already exists, then its value will get updated <dictionary>[<key>] = <value>

     Deleting an element from a dictionary

    del keyword is used to delete a specified key:value pair from the dictionary as follows: del <dictionary>[<key>]

     Dictionary Functions & Methods

    Below are some of the methods of dictionaries

     len() method

    It returns the length of the dictionary, i.e., the count of elements (key: value pairs) in the dictionary len(dictionary)

     clear() method

    Removes all the elements from the dictionary dictionary.clear()

     get() method

    Returns the value of the specified key dictionary.get(keyname)

     items() method

    Returns a list containing a tuple for each key-value pair dictionary.items()

     keys() method

    Returns a list containing the dictionary's keys dictionary.keys()

     values() method

    Returns a list of all the values in the dictionary dictionary.values()

     update() method

    Updates the dictionary with the specified key-value pairs dictionary.update(iterable)


    Indentation

    In Python, indentation means the code is written with some spaces or tabs into many different blocks of code to indent it so that the interpreter can easily execute the Python code. Indentation is applied on conditional statements and loop control statements. Indent specifies the block of code that is to be executed depending on the conditions.


    Conditional Statements

    The if, elif and else statements are the conditional statements in Python, and these implement selection constructs (decision constructs).

     if Statement

    if(conditional expression): statements

     if-else Statement

    if(conditional expression): statements else: statements

     if-elif Statement

    if (conditional expression): statements elif (conditional expression): statements else: statements

     Nested if-else Statement

    if (conditional expression): if (conditional expression): statements else: statements else: statements example a=15 b=20 c=12 if(a>b and a>c): print(a,"is greatest") elif(b>c and b>a): print(b," is greatest") else: print(c,"is greatest")


    Loops in Python

    A loop or iteration statement repeatedly executes a statement, known as the loop body, until the controlling expression is false (0).

     For Loop

    The for loop of Python is designed to process the items of any sequence, such as a list or a string, one by one. for <variable> in <sequence>: statements_to_repeat example for i in range(1,101,1): print(i)

     While Loop

    A while loop is a conditional loop that will repeat the instructions within itself as long as a conditional remains true. while <logical-expression>: loop-body example i=1 while(i<=100): print(i) i=i+1

     Break Statement

    The break statement enables a program to skip over a part of the code. A break statement terminates the very loop it lies within. for <var> in <sequence>: statement1 if <condition>: break statement2 statement_after_loop example for i in range(1,101,1): print(i ,end=" ") if(i==50): break else: print("Mississippi") print("Thank you")

     Continue Statement

    The continue statement skips the rest of the loop statements and causes the next iteration to occur. for <var> in <sequence>: statement1 if <condition> : continue statement2 statement3 statement4 example for i in [2,3,4,6,8,0]: if (i%2!=0): continue print(i)


    Functions

    A function is a block of code that performs a specific task. You can pass parameters into a function. It helps us to make our code more organized and manageable.

     Function Definition

    def my_function(): #statements def keyword is used before defining the function. 

     Function Call

    my_function() Whenever we need that block of code in our program simply call that function name whenever neeeded. If parameters are passed during defing the function we have to pass the parameters while calling that function example def add(): #function defination a=10 b=20 print(a+b) add() #function call

     Return statement in Python function

    The function return statement return the specified value or data item to the caller.  return [value/expression]

     Arguments in python function

    Arguments are the values passed inside the parenthesis of the function while defining as well as while calling. def my_function(arg1,arg2,arg3....argn): #statements my_function(arg1,arg2,arg3....argn) example def add(a,b): return a+b x=add(7,8) print(x)


    File Handling

    File handling refers to reading or writing data from files. Python provides some functions that allow us to manipulate data in the files.

     open() function

    var_name = open("file name", " mode")

     modes-

    1. r - to read the content from file
    2. w - to write the content into file
    3. a - to append the existing content into file
    4. r+:  To read and write data into the file. The previous data in the file will be overridden.
    5. w+: To write and read data. It will override existing data.
    6. a+: To append and read data from the file. It won’t override existing data.

     close() function

    var_name.close()

     read () function

    The read functions contains different methods, read(),readline() and readlines() read() #return one big string It returns a list of lines readlines() #returns a list It returns one line at a time readline #returns one line at a time

     write function

    This function writes a sequence of strings to the file. write() #Used to write a fixed sequence of characters to a file It is used to write a list of strings writelines()


    Exception Handling

    An exception is an unusual condition that results in an interruption in the flow of a program.

     try and except

    A basic try-catch block in python. When the try block throws an error, the control goes to the except block. try: [Statement body block] raise Exception() except Exceptionname: [Error processing block]

     else

    The else block is executed if the try block have not raise any exception and code had been running successfully try: #statements except: #statements else: #statements


    finally

    Finally block will be executed even if try block of code has been running successsfully or except block of code is been executed. finally block of code will be executed compulsory


    Object Oriented Programming (OOPS)

    It is a programming approach that primarily focuses on using objects and classes. The objects can be any real-world entities.

     class

    The syntax for writing a class in python class class_name: pass #statements

     Creating an object

    Instantiating an object can be done as follows: <object-name> = <class-name>(<arguments>)

     self parameter

    The self parameter is the first parameter of any function present in the class. It can be of different name but this parameter is must while defining any function into class as it is used to access other data members of the class

     class with a constructor

    Constructor is the special function of the class which is used to initialize the objects. The syntax for writing a class with the constructor in python class CodeWithHarry: # Default constructor def __init__(self): self.name = "CodeWithHarry" # A method for printing data members def print_me(self): print(self.name)

     Inheritance in python

    By using inheritance, we can create a class which uses all the properties and behavior of another class. The new class is known as a derived class or child class, and the one whose properties are acquired is known as a base class or parent class. It provides the re-usability of the code. class Base_class: pass class derived_class(Base_class): pass

     Types of inheritance-

  • Single inheritance
  • Multiple inheritance
  • Multilevel inheritance
  • Hierarchical inheritance
  •  filter function

    The filter function allows you to process an iterable and extract those items that satisfy a given condition filter(function, iterable)

     issubclass function

    Used to find whether a class is a subclass of a given class or not as follows issubclass(obj, classinfo) # returns true if obj is a subclass of classinfo


    Iterators and Generators

    Here are some of the advanced PythonCheatSheettopics of the Python programming language like iterators and generators

     Iterator

    Used to create an iterator over an iterable iter_list = iter(['Harry', 'Aakash', 'Rohan']) print(next(iter_list)) print(next(iter_list)) print(next(iter_list))

     Generator

    Used to generate values on the fly # A simple generator function def my_gen(): n = 1 print('This is printed first') # Generator function contains yield statements yield n n += 1 print('This is printed second') yield n n += 1 print('This is printed at last') yield n


    Decorators

    Decorators are used to modifying the behavior of a function or a class. They are usually called before the definition of a function you want to decorate.

     property Decorator (getter)

    @property def name(self): return self.__name

     setter Decorator

    It is used to set the property 'name' @name.setter def name(self, value): self.__name=value

     deleter Decorator

    It is used to delete the property 'name' @name.deleter #property-name.deleter decorator def name(self, value): print('Deleting..') del self.__name

    杀手级的自动化脚本



    1. 图片优化器

    # 图片优化器 # 首先安装Pillow库 # pip install Pillow from PIL import Image, ImageEnhance, ImageFilter, ImageOps # 打开一张图片 im = Image.open("Image1.jpg") # 裁剪 im_cropped = im.crop((34, 23, 100, 100)) # 调整大小 im_resized = im.resize((50, 50)) # 翻转 im_flipped = im.transpose(Image.FLIP_LEFT_RIGHT) # 模糊 im_blurred = im.filter(ImageFilter.BLUR) # 锐化 im_sharpened = im.filter(ImageFilter.SHARPEN) # 设置亮度 enhancer = ImageEnhance.Brightness(im) im_brightened = enhancer.enhance(1.5) # 增加亮度50% # 保存处理后的图片 im_cropped.save("image_cropped.jpg") im_resized.save("image_resized.jpg") 这个脚本使用Pillow库来处理图像,包括裁剪、调整大小、翻转和调整亮度等操作。这些功能允许用户快速编辑照片,而不是依赖复杂的软件工具。 假设你刚刚从旅行中回来,有很多照片需要整理。你想把某些照片裁剪成合适的尺寸并提升亮度,以便分享给朋友。这个脚本可以帮助你快速完成这些任务,让你的照片看起来更加专业。

    2. 视频优化器

    # 视频优化器 # 首先安装MoviePy库 # pip install moviepy import moviepy.editor as pyedit # 加载视频 video = pyedit.VideoFileClip("vid.mp4") # 修剪视频 vid1 = video.subclip(0, 10) vid2 = video.subclip(20, 40) final_vid = pyedit.concatenate_videoclips([vid1, vid2]) # 加速视频 final_vid = final_vid.speedx(2) # 添加音频到视频 aud = pyedit.AudioFileClip("bg.mp3") final_vid = final_vid.set_audio(aud) # 保存视频 final_vid.write_videofile("final_video.mp4") 上面的代码利用MoviePy库处理视频,能够修剪、添加音频和加速视频等。这种方式使得视频编辑变得简单高效,无需复杂的界面操作。 想象一下,你拍了一段很长的旅行视频,但只想分享其中的一小部分。使用这个脚本,你可以轻松剪辑出精彩片段,并为其添加背景音乐,制作出一个令人印象深刻的短视频与家人和朋友分享。

    3. PDF转图片

    # PDF转图片 # 首先安装PyMuPDF库 # pip install PyMuPDF import fitz def pdf_to_images(pdf_file): doc = fitz.open(pdf_file) for p in doc: pix = p.get_pixmap() output = f"page{p.number}.png" pix.writePNG(output) # 转换PDF文件 pdf_to_images("test.pdf") 此脚本使用PyMuPDF库将PDF文件的每一页转换为PNG格式的图片。这对于需要从PDF中提取图像或文本内容的情况非常有用。 如果你收到了一份包含多张图片的PDF文档,并希望将它们单独保存以便查看,可以使用这个脚本快速提取所有页面,避免手动截屏的麻烦。

    4. 获取API数据

    # 获取API数据 # 首先安装requests库 # pip install requests import requests # 设置API URL url = "https://api.github.com/users/psf/repos" # 发送GET请求 response = requests.get(url) # 检查请求是否成功 if response.status_code == 200: repos = response.json() # 将返回的JSON数据解析为Python对象 for repo in repos: print(f"Repository Name: {repo['name']}, Stars: {repo['stargazers_count']}") else: print(f"请求失败,状态码: {response.status_code}") 该脚本使用requests库从GitHub API获取用户的仓库信息,检查请求状态,并提取相关数据。API交互是现代Web应用程序开发的重要组成部分。 假设你正在评估一款开源项目的受欢迎程度。通过这个脚本,你可以轻松获取仓库名称和星标数量,从而快速判断该项目的活跃度和社区支持。

    5. 电池指示灯

    # 电池指示灯 # 首先安装plyer和psutil库 # pip install plyer psutil from plyer import notification import psutil from time import sleep while True: battery = psutil.sensors_battery() life = battery.percent if life < 20: notification.notify( title="电池低", message="请连接电源!", timeout=10 ) sleep(60) # 每60秒检查一次 此脚本使用psutil库监测电池状态,并通过plyer库发送通知。当电量低于20%时,它会提醒用户充电,这是一项非常实用的系统监控功能。 你正在进行在线会议,突然发现电池电量不足。这个脚本能够及时提醒你充电,确保不会因为电量耗尽而影响工作进度。

    6. 语法固定器

    # 语法固定器 # 首先安装happytransformer库 # pip install happytransformer from happytransformer import HappyTextToText as HappyTTT from happytransformer import TTSettings def grammar_fixer(text): grammar_model = HappyTTT("T5", "prithivida/grammar_error_correcter_v1") config = TTSettings(do_sample=True, top_k=10, max_length=100) corrected = grammar_model.generate_text(text, args=config) print("Corrected Text:", corrected.text) text_to_fix = "This is smple tet we how know this" grammar_fixer(text_to_fix) 这个脚本使用HappyTransformer模型检查并修正文本中的语法错误。它利用机器学习算法来识别并纠正错误,使写作更加流畅。 如果你在撰写一篇文章但对语法不太自信,这个脚本可以帮助你快速检查并修正错误,确保你的文本更加专业。

    7. 拼写修正

    # 拼写修正 # 首先安装textblob库 # pip install textblob from textblob import TextBlob # 修正段落拼写 def fix_paragraph(paragraph): sentence = TextBlob(paragraph) correction = sentence.correct() print(correction) # 修正单词拼写 def fix_word(word): from textblob import Word corrected_word = Word(word).correct() print(corrected_word) fix_paragraph("This is sammple tet!!") fix_word("maangoo") 该脚本利用TextBlob库修正文本中的拼写错误。不仅可以处理整段文字,还能逐个单词进行修正,提高书写质量。 当你在撰写社交媒体帖子时,如果担心拼写错误可能影响阅读体验,可以使用这个脚本来检查和修正文本,提高你的表达效果。

    8. 互联网下载器

    # 互联网下载器 # 首先安装internetdownloadmanager库 # pip install internetdownloadmanager import internetdownloadmanager as idm def downloader(url, output): pydownloader = idm.Downloader(worker=20, part_size=1024*1024*10, resumable=True,) pydownloader.download(url, output) downloader("http://example.com/image.jpg", "image.jpg") downloader("http://example.com/video.mp4", "video.mp4") 这个脚本创建了一个简单的下载管理器,使用internetdownloadmanager库实现大文件下载。它支持多线程,能够提高下载速度。 在下载大文件或多个文件时,使用这个脚本可以快速且高效地完成下载任务,尤其是在网络不稳定的情况下,能够保证下载的完整性和效率。

    9. 获取世界新闻

    # 获取世界新闻 # 首先安装requests库 # pip install requests import requests ApiKey = "YOUR_API_KEY" url = f"https://api.worldnewsapi.com/search-news?text=hurricane&api-key={ApiKey}" headers = { 'Accept': 'application/json' } response = requests.get(url, headers=headers) print("News:", response.json()) 该脚本使用requests库从新闻API中获取实时新闻数据。通过设置HTTP请求参数,可以灵活获取特定主题的新闻信息。 如果你对当前事件感兴趣,使用这个脚本可以快速获取最新的新闻报道,从而让你更好地了解世界动态,不必手动搜索各大新闻网站。

    10. PySide2 GUI

    # PySide2 GUI # 首先安装PySide2库 # pip install PySide2 from PySide6.QtWidgets import QApplication, QWidget, QPushButton, QLabel, QLineEdit, QRadioButton, QCheckBox, QSlider, QProgressBar import sys app = QApplication(sys.argv) window = QWidget() # 调整窗口大小 window.resize(500, 500) # 设置窗口标题 window.setWindowTitle("PySide2 Window") # 添加按钮 button = QPushButton("Click Me", window) button.move(200, 200) # 添加标签 label = QLabel("Hello Medium", window) label.move(200, 150) # 添加输入框 input_box = QLineEdit(window) input_box.move(200, 250) # 添加单选按钮 radio_button = QRadioButton("Radio Button", window) radio_button.move(200, 300) # 添加复选框 checkbox = QCheckBox("Checkbox", window) checkbox.move(200, 350) # 添加滑块 slider = QSlider(window) slider.move(200, 400) # 添加进度条 progress_bar = QProgressBar(window) progress_bar.move(200, 450) # 显示窗口 window.show() sys.exit(app.exec()) 这个脚本使用PySide2库创建了一个简单的图形用户界面(GUI)应用程序。它演示了如何添加各种GUI组件,如按钮、标签、输入框等。 如果你想为你的项目或日常工具创建一个用户友好的界面,使用这个脚本可以快速搭建起一个界面,让用户可以方便地进行交互。